Embedding intelligence in enhanced music mapping agents

By

MARNITZ CORNELL GRAY

DISSERTATION

submitted in fulfilment of the requirements for the degree

MASTER OF SCIENCE

In

COMPUTER SCIENCE

in the

FACULTY OF SCIENCE

at the

UNIVERSITY OF JOHANNESBURG

SUPERVISOR: PROF. E.M. EHLERS

SEPTEMBER 2007

Abstract

Keywords: Pluggable Intelligence, Intelligent Music Selection

Artificial Intelligence has been an increasing focus of study over the past years. Agent technology has emerged as being the preferred model for simulating intelligence [Jen00a]. Focus is now turning to inter-agent communication [Jen00b] and agents that can adapt to changes in their environment.

Digital music has been gaining in popularity over the past few years. Devices such as Apple’s iPod have sold millions. These devices have the capability of holding thousands of songs. Managing such a device and selecting a list of songs to play from so many can be a difficult task.

This dissertation expands on agent types by creating a new agent type known as the Modifiable Agent. The Modifiable Agent type defines agents which have the ability to modify their intelligence depending on what data they need to analyse. This allows an agent to, for example, change from being a goal based to a learning based agent, or allows an agent to modify the way in which it processes data.

Digital music is a growing field with devices such as the Apple iPod revolutionising the industry. These devices can store large amounts of songs and as such, make it very difficult to navigate as they usually don’t include devices such as a mouse or keyboard. Therefore, creating a play list of songs can be a tiresome process which can lead to the user playing the same songs over and over. The goal of the dissertation is to provide research into methods of automatically creating a play list from a user selected song, i.e. once a user selects a song, a list of similar music is automatically generated and added to the

ii

user’s playlist. This simplifies the task of selecting music and adds diversity to the songs which the user listens to.

The dissertation introduces intelligent music selection, or selecting a play list of songs depending on music classification techniques and past human interaction.

iii

Table of Contents

ABSTRACT ...... II

TABLE OF CONTENTS ...... IV

LIST OF FIGURES ...... XII

LIST OF TABLES ...... XIII

LIST OF ABBREVIATIONS...... XIII

1 INTRODUCTION ...... 1

1.1 Introduction ...... 1

1.2 Problem Statement...... 1

1.3 Chapter Outlay ...... 3 1.3.1 Agent Technology ...... 3 1.3.2 Digital Music ...... 3 1.3.3 Model Design ...... 4 1.3.4 The Prototype ...... 4 1.3.5 Conclusions and Future Research ...... 4

1.4 Conclusion ...... 5

2 INTELLIGENT AGENTS ...... 6

2.1 Introduction ...... 6

2.2 What is an Agent? ...... 7

2.3 Agent Types ...... 9

iv

2.4 Properties or characteristics of Agents ...... 13 2.4.1 General agent properties ...... 13 2.4.2 Deliberate agent properties ...... 14 2.4.3 More advanced agent properties ...... 15

2.5 Classifying Agents ...... 16

2.6 Agent Languages ...... 17

2.7 Agent-Oriented Programming ...... 18

2.8 Conclusion ...... 20

3 INTELLIGENCE AS A PLUG-IN FOR AGENTS ...... 22

3.1 Introduction ...... 22

3.2 Why Embed Intelligence? ...... 22

3.3 Methods of Embedding Intelligence ...... 23 3.3.1 Compile Time ...... 23 3.3.2 Run Time ...... 24

3.4 Conclusion ...... 31

4 THE MODIFIABLE AGENT TYPE ...... 33

4.1 Introduction ...... 33

4.2 Why Create Agent Type Definitions? ...... 33

4.3 Description of a Modifiable Agent ...... 34

4.4 Why create an agent that can modify itself? ...... 34

4.5 Disadvantages of the Modifiable Agent ...... 35

v

4.6 Self-Aware Agents ...... 36

4.7 The Self-Aware Modifiable Agent ...... 37

4.8 Conclusion ...... 38

5 BACKGROUND TO DIGITAL MUSIC ...... 40

5.1 Introduction ...... 40

5.2 History of Digital Music ...... 40

5.3 Growth of Digital Music ...... 41

5.4 The Impact of digital music ...... 43

5.5 Current Digital Music Technologies ...... 44 5.5.1 Introduction ...... 44 5.5.2 Pulse Code Modulation (PCM) ...... 44 5.5.3 MPEG 1 Audio Layer 3 (MP3) ...... 45 5.5.4 (Ogg) ...... 48 5.5.5 Audio (WMA) ...... 49 5.5.6 MPEG-2 AAC ...... 50

5.6 Conclusion ...... 51

6 CURRENT AGENTS IN DIGITAL MUSIC ...... 55

6.1 Introduction ...... 55

6.2 MusicBrainz and Picard ...... 55

6.3 MoodLogic ...... 57

6.4 FixTunes ...... 57

6.5 Relatable ...... 58 vi

6.6 MusicMagic and MusicIP ...... 58

6.7 Beat Tracking System ...... 61

6.8 Conclusion ...... 61

7 AUDIO SIGNAL CLASSIFICATION ...... 63

7.1 Introduction ...... 63

7.2 Beat Tracking ...... 63

7.3 Fourier analysis ...... 64

7.4 Constant-Q analysis...... 65

7.5 Wavelet Transforms ...... 66

7.6 Pitch analysis ...... 69

7.7 Data Reduction (Feature Extraction) ...... 71 7.7.1 Physical Features ...... 71 7.7.2 Perceptual Features ...... 74

7.8 Significant Patterns...... 76

7.9 Clustering ...... 76 7.9.1 Neural Nets ...... 77 7.9.2 Successive Restriction ...... 79 7.9.3 K-Means Clustering ...... 80

7.10 Analysis Duration ...... 81 7.10.1 Fixed Analysis Time Frame ...... 82 7.10.2 Multi Resolution Analysis ...... 83

7.11 Conclusion ...... 84

vii

8 MODEL – THE INTELLIGENT MUSIC MIXING AGENT (IMMA) ...... 86

8.1 Introduction ...... 86

8.2 The IMMA ...... 86

8.3 The Environment ...... 87 8.3.1 Processing power ...... 87 8.3.2 Files to be analysed ...... 92

8.4 Embedding Intelligence into the Agent ...... 92 8.4.1 How the IMMA Perceives its Environment ...... 92 8.4.2 Different Intelligence Schemes ...... 93

8.5 IMMA Skills ...... 97 8.5.1 Clustering ...... 97 8.5.2 Selection ...... 99 8.5.3 Rating ...... 100 8.5.4 Conclusion ...... 101

8.6 Knowledge Obtained ...... 102 8.6.1 Introduction ...... 102 8.6.2 The Database ...... 102 8.6.3 Structure of the database ...... 104

8.7 Chapter Conclusion ...... 106

9 PROTOTYPE – THE ENHANCED MUSIC MAPPING AGENT (EMMA) .. 107

9.1 Introduction ...... 107

9.2 EMMA ...... 107

9.3 The Interfaces ...... 108 9.3.1 The IPlug-in Interface ...... 110

viii

9.3.2 The IHost Interface ...... 112 9.3.3 The IDatabase Interface ...... 113 9.3.4 The IMain Interface ...... 115

9.4 Conclusion ...... 116

10 THE COMPONENTS OF THE EMMA ...... 117

10.1 Introduction ...... 117

10.2 The Main Agent Program ...... 117

10.3 The Intelligence Chooser ...... 119

10.4 The Intelligence Plug-ins ...... 120

10.5 The Database Module ...... 120

10.6 User Interaction Module ...... 121 10.6.1 Introduction ...... 121 10.6.2 Skipping a song ...... 121 10.6.3 The sliding scale rating system ...... 123 10.6.4 Playing the full length of a song ...... 124 10.6.5 User interaction in selecting a song ...... 125 10.6.6 Section conclusion ...... 126

10.7 Playlist generator ...... 126 10.7.1 Introduction ...... 126 10.7.2 Beat selection ...... 126 10.7.3 Histogram selection ...... 127 10.7.4 Rating based ...... 127

10.8 Conclusion ...... 127

ix

11 ANALYSIS OF THE EMMA ...... 129

11.1 Introduction ...... 129

11.2 Motivation and description of the EMMA ...... 129

11.3 The PEAS description of the agent...... 129

11.4 Properties of the task environment ...... 131

11.5 Conclusion ...... 132

12 OPERATION OF THE EMMA ...... 133

12.1 Introduction ...... 133

12.2 The Main Agent Window ...... 133

12.3 The User Interaction Module ...... 133

12.4 Selecting Components ...... 134 12.4.1 The Automatic Chooser ...... 136 12.4.2 The Manual Chooser ...... 137

12.5 Song Rating ...... 137

12.6 Playlist generation ...... 138

12.7 Conclusion ...... 139

13 CONCLUSION AND FURTHER RESEARCH ...... 141

13.1 Conclusion ...... 141 13.1.1 Pluggable Intelligence ...... 141 13.1.2 Digital Music ...... 142

x

13.2 Further Research ...... 143 13.2.1 Module loading ...... 143 13.2.2 Analytical techniques ...... 143 13.2.3 User interaction ...... 144 13.2.4 Play list generation ...... 144

13.3 Final Word ...... 144

BIBLIOGRAPHY ...... I

xi

List of Figures

Figure 3.1 – Different schemes for an Intelligence Chooser ...... 25 Figure 3.2 – The Agent Academy Agent Creation Process ...... 30 Figure 5.1 - Ratings of different sound encoding ...... 52 Figure 5.2 - Comparison of average file size ...... 53 Figure 6.1 - Picard profiling an album ...... 56 Figure 6.2 - The Mood Option for MusicMagic ...... 59 Figure 6.3 - Predixis MusicMagic Mixer ...... 60 Figure 7.1 - An example of a Fourier Transform...... 65 Figure 7.2a – Signal ...... 68 Figure 7.3 – Pitch histogram of a jazz song and an Irish folk song ...... 70 Figure 7.4 - Examples of Zero-Cross rate ...... 72 Figure 7.5 - Spectrum and Wave Form analysis of several instruments ...... 73 Figure 7.6 – An example of a neural network ...... 78 Figure 7.7 - An example of a k-means classification for three classes...... 81 Figure 8.1 - The IMMA ...... 86 Figure 8.2 - A comparison of classification methods on various data-sets...... 94 Figure 8.3 - Beat histograms for classical and contemporary popular music .... 95 Figure 9.1 - The EMMA Module Layout ...... 108 Figure 9.2 - The EMMA Interfaces ...... 109 Figure 9.3 The IPlugin Interface ...... 111 Figure 9.4 The IHost Interface ...... 113 Figure 9.5 The IDatabase Interface ...... 115 Figure 9.6 The IMain Interface ...... 115 Figure 10.1 - UML diagram of the EMMA ...... 118 Figure 10.2 - Examples of different song rating ...... 124 Figure 12.1 - The EMMA and Winamp on Start-up ...... 134 Figure 12.2 - The Plugin Chooser ...... 135 Figure 12.3 - The Automatic Chooser ...... 136

xii

Figure 12.4 - The Manual Chooser ...... 137 Figure 12.5 - Rating Changes ...... 138 Figure 12.6 - Song rating in the database ...... 138 Figure 12.7 - EMMA Playlist selection ...... 139

List of Tables

Table 5.1 – Trade Revenues by Format ...... 42 Table 5.2 - 2005 Year End Statistics ...... 42 Table 11.1 – PEAS table ...... 130

List of Abbreviations

AAC - ACL - Agent Communication Language ADSL - Asynchronous Digital Subscriber Lines AOP - Agent-Oriented Programming APE - Agent Prototyping Environment ARPA - Advanced Research Projects Agency ASC - Audio signal classification ASF - BDI - Belief-Desire-Intention BRML - Business Rules Markup Language BTS - Beat Tracking System CDDB - Database CORBA - Common Object Request Broker Architecture CLP - Courteous Logic Program DAML - The Defense Advanced Research Projects Agency Agent Markup Language

xiii

DARPA - The Defense Advanced Research Projects Agency DARPA DBMS - Database Management System DLL - Dynamic Linked Library DRM - Digital Rights Management DWT - Discrete Wavelet Transform EMMA - Enhanced Music Mapping Agent FIPA - Foundation for Intelligent Physical Agents HMMs - Hidden Markov models IMMA - Intelligent Music Mixing Agent KSE - Knowledge Sharing Effort KQML - Knowledge Query Manipulation Language LBDM - Local Boundary Detection Model MPEG-2 - Moving Pictures Expert Group 2 MP3 - Moving Pictures Expert Group 2 Layer 3 Ogg - Ogg Vorbis PCM - Pulse Code Modulation PDF - Portable Document Format RADL - Reticular Agent Definition Language RAISE - Reusable Agent Intelligence Software Environment RIAA - Recording Industry Association of America RDF - Resource Description Framework RPC - Remote Procedure Call SBR - Spectral Band Replication SQL - Structured Query Language STFT - Short Time Fourier Transform WAN - Wide Area Network WMA - Microsoft Windows Audio WWW - World Wide Web XML - Extensible Markup Language ZCR - Zero-Crossing Rate

xiv

1 Introduction

1.1 Introduction

Artificial Intelligence is a much debated subject. People dream of machines that will understand their commands and do their chores. Even Hollywood portrays scenes of machines taking over the world in very violent ways.

Artificial Intelligence is such a broad topic that researchers tend to focus on specific areas of research rather than the broader whole.

Every day new technology is invented. Faster methods for performing calculations are discovered, better and ever faster processors are created and storage grows constantly. Software development has fallen behind the rapid increase in hardware development. The need for a new way of developing software has emerged – one where individual components of software can be added or removed, without the need to modify the entire package.

This dissertation focuses on agent research and specifically, agents that are able to adapt to different environments.

1.2 Problem Statement

Current agents are limited in what they are able to do by the state of current technology and by research into specialised fields. If new analytical techniques are researched, the agent program has to be completely rewritten to take advantage. Take for example a mobile agent with the freedom to roam the internet. This agent is limited by the way it was originally programmed. If it stumbles across technology that allows it to improve upon itself, it will have to 1

report back to the original designers so that they may incorporate the technology in its programming.

The aim is to move to a more open method of agent design, where objects such as the agent’s intelligence can be exchanged or added onto by using simple hooks. Cross platform environments are becoming more and more popular (Sun Java, Microsoft .NET) which makes this solution more viable.

Take again the mobile agent described earlier. If the mobile agent were to stumble across a document which was coded in a different format that it did not understand (for example a Portable Document Format (PDF) document), then it would simply be able to search a repository of agent intelligence for a plug-in that would enable it to read PDF documents. Should it find one, it could simply plug this in and continue reading the document. This allows the agent to continue functioning as well as saving the original programmer time.

Digital music is a growing field with devices such as the Apple iPod revolutionising the industry. These devices can store large amounts of songs and as such, make it very difficult to navigate as they usually don’t include devices such as a mouse or keyboard. Therefore, creating a play list of songs can be a tiresome process which can lead to the user playing the same songs over and over.

The goal of the dissertation is to provide research into methods of automatically creating a play list from a user selected song, i.e. once a user selects a song, a list of similar music is automatically generated and added to the user’s playlist. This simplifies the task of selecting music and adds diversity to the songs which the user listens to. .

2

1.3 Chapter Outlay

This dissertation can be divided into five main sections. They are as follows:

1.3.1 Agent Technology

Chapter 2 covers agent technology, the definition of an intelligent agent and how to classify a particular software agent. This chapter provides a limited background to agents and should be supplemented with other papers.

In chapter 3, the concept of pluggable technology for agent intelligence is introduced. This chapter covers reasons for embedding intelligence into agents as well as the different options for embedding intelligence into an agent.

A new agent type, the Modifiable agent, is discussed and explained in chapter 4. This chapter covers reasons for creating a new agent type, as well as the requirements of such an agent type.

1.3.2 Digital Music

Digital music has come a long way since 16-bit Pulse Code Modulation (PCM) encoding. Digital music encoding techniques, such as Moving Pictures Expert Group 2 (MPEG-2) Layer 3 (MP3) allows the user to store many times more songs on a compact disc. Chapter 5 introduces the subject of digital music. It covers the advantages and disadvantages of the major digital music technologies and formats in use at the time of writing.

3

Chapter 6 covers the limited number of agents in the field of digital music at the time of writing. It analyses their functions as well as how well they perform.

Digital music analysis or digital signal analysis is an ever growing field. Chapter 7 introduces some of the more common techniques for analysing digital music signals.

1.3.3 Model Design

Chapter 8 introduces key concepts in the design of an intelligent music selection agent. It describes goals to be achieved in a world with no boundaries. This chapter lays the foundation for the prototype.

1.3.4 The Prototype

The prototype section is split into three chapters. Chapter 9 describes the skeleton structure of the Intelligent Music Mixing Agent (IMMA). It introduces the interfaces that are used to allow communication between the different modules.

Chapter 10 provides and in-depth look at the different modules that make up the IMMA. All of these components are modifiable.

Chapter 11 provides an analytical viewpoint of the IMMA as a whole. It uses techniques of other researchers to classify and describe the agent.

1.3.5 Conclusions and Future Research

The dissertation concludes in chapter 12 by providing an overview of the entire dissertation as well as areas where future research is possible. 4

1.4 Conclusion

This chapter serves as an introduction to the dissertation. It provides information on the chapter breakdown, topics that are discussed in each chapter, and classifies chapters into broader groups.

5

2 Intelligent Agents

"The hardest thing to understand is why we can understand anything at all"

Albert Einstein

2.1 Introduction

Agent technology is a relatively new branch of Artificial Intelligence. First conceived by Marvin Minsky in a paper published in 1986, its uptake was really only in the early to mid 1990’s [Mid00]. The reason for the slow uptake was that researches had historically tended to focus on the various different components of intelligent behaviour (learning, reasoning, problem solving, vision understanding etc) in isolation [Jen98].

Agents are being used in an increasingly wide variety of applications, ranging from comparatively small systems such as e-mail filters to large, open, complex, mission critical systems such as air traffic control [Jen98].

Agents provide designers and developers with a way of structuring an application around autonomous, communicative components. They offer a new and often more appropriate route to the development of complex systems [Luc03]. Multi- agent systems offer strong models for representing real-world environments with an appropriate degree of complexity and dynamism. For example, simulation of economies, societies and biological environments are typical application areas [Luc03].

6

2.2 What is an Agent?

Defining exactly what an agent is, is a difficult task. The different definitions of researchers in the field vary from complex definitions like: “An agent is a computer system, situated in some environment, that is capable of flexible autonomous action in order to meet its design objectives.”[Jen98] to overly simple definitions like: “A software agent is a program that performs tasks for its user” [Fon99]. Nwana has even resigned to saying that the definition will never be agreed upon: “We have as much chance of agreeing on a consensus definition for the word 'agent' as AI researchers have of arriving at one for 'artificial intelligence' itself - nil!” [Nwa96].

To build an understanding of what an agent is, a good place to start is Webster’s definition of a software agent

“Software Agent: Any software that is designed to use intelligence to automatically carry out an assigned task, mainly retrieving and delivering information.” (Webster's New Millennium Dictionary of English, Preview Edition (v 0.9.6))

This definition may seem overly simplified. Before examining Webster’s definition, Luck’s definition of an agent is provided:

“Agents can be defined to be autonomous, problem-solving computational entities capable of effective operation in dynamic and open environments.”[Luc03, Luc04]

Two key terms are present in Luck’s definition of agents. Firstly, autonomous: Autonomy means that the agent should be able to act without direct intervention of humans or other agents. Secondly, problem-solving: Problem-solving means that the agent has a specific problem that it is designed to solve.

7

Comparing the two definitions, Webster’s isn’t all that far off. It includes the key terms autonomous (automatically) and problem-solving (carry out an assigned task). Where it falls short, is by using the word ‘intelligence’. This definition therefore excludes mobile agents.

Luck’s definition is quite similar to that of Jennings, quoted at the beginning of this section, the difference being that Luck restricts the environment in which agents may operate.

Russell & Norvig define an agent as: “An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators” [Rus03].

This definition depends heavily on what the environment is interpreted to be as well as sensing and acting to mean. As Franklin and Graesser point out: “If we define the environment as whatever provides input and receives output, and take receiving input to be sensing and producing output to be acting, every program is an agent. Thus, if we want to arrive at a useful contrast between agent and program, we must restrict at least some of the notions of environment, sensing and acting” [Fra96].

The final definition of an agent comes from that of Franklin and Graesser “A system situated within and a part of an environment that senses that environment and acts upon it, over time, in pursuit of its own agenda and so as to effect what it senses in the future” [Fra96].

Franklin and Graesser seem to have created a good definition of agents. They have restricted the environment and have not made a requirement that the agent acts intelligently.

8

This shows some of the viewpoints that different authors have as to what an agent actually is. The general consensus would seem to be that a software agent is a piece of software/system, situated in an environment, which senses this environment as well as acts upon those sensory inputs. This agent must do so automatically and must pursue some goal (i.e. its actions must not be random)

The subject of defining an agent could well be argued for a very long time to come. The next section looks at some of the agent types, their advantages and disadvantages, which have been developed over the past few years.

2.3 Agent Types

Nwana discusses six different types of agents: collaborative agents, interface agents, mobile agents, information/internet agents, reactive agents and hybrid agents [Nwa96]. The following section discusses each of these agent types as well as three others namely: Intentional agents, Social agents and Abstract Agents

Collaborative agents: collaborative agents emphasise autonomy and co- operation (with other agents) in order to perform tasks. They may learn, but this aspect is not typically a major emphasis of their operation [Nwa96].

Interface agents: Interface agents emphasise autonomy and learning in order to perform tasks for their owners. The key metaphor underlying interface agents is that of a personal assistant who is collaborating with the user in the same work environment. Note the subtle distinction between collaborating with the user and collaborating with other agents as is the case with collaborative agents. Collaborating with a user may not require an explicit agent communication language as one required when collaborating with other agents [Lib97, Nwa96].

9

Mobile agents: Mobile agents are computational software processes capable of roaming wide area networks (WANs) such as the world wide web (WWW), interacting with foreign hosts, gathering information on behalf of its owner and then returning after having performed the duties set by its user. Mobile agents tend to save their running state before migrating to a new host, and then restoring this state once it has migrated. A mobile agent’s duties may range from a flight reservation to managing a telecommunications network. However, mobility is neither a necessary nor sufficient condition for agenthood. Mobile agents are agents because they are autonomous and they co-operate, albeit differently to collaborative agents [Kotz99].

Information/Internet Agents: Information agents have come about because of the sheer demand for tools to help manage the explosive growth of information currently being experienced, and which will continue to be experienced henceforth. Information agents perform the role of managing, manipulating or collating information from many distributed sources. There is a rather fine distinction between information agents and some of those which have earlier been classed as interface or collaborative agents. Interface or collaborative agents started out quite distinct, but with the explosion of the WWW and because of their applicability to this vast WAN, there is now a significant degree of overlap. This is inevitable especially since information or internet agents are defined using different criteria. They are defined by what they do, in contrast to collaborative or interface agents which are defined by what they are [Nwa96].

Reactive Software Agents: Reactive agents represent a special category of agents which do not possess internal, symbolic models of their environments. Instead they act/respond in a stimulus-response manner to the present state of the environment in which they are embedded. A most important point of note with reactive agents is that the agents are relatively simple and they interact with other agents in basic ways. Nevertheless, complex patterns of 10

behaviour emerge from these interactions when the ensemble of agents is viewed globally. Three key ideas underpin reactive agents. Firstly, emergent functionality, i.e. the dynamics of the interaction leads to the emergent complexity. This means that there is no prior specification or plan of the behaviour of reactive agents. Secondly: task decomposition, a reactive agent is viewed as a collection of modules which operate autonomously and are responsible for specific tasks. These modules communicate at a very low level and communication is kept to a minimum. No global model exists within any of the agents and, hence, the global behaviour has to emerge. Thirdly, reactive agents tend to operate on representations which are close to raw sensor data, in contrast to the high-level symbolic representations that abound in the other types of agents discussed so far [Nol02].

Intentional Agents: This agent type emerged from Classical Artificial Intelligence research into planning and reasoning. The majority of intentional agent architectures implement some form of Belief-Desire-Intention (BDI) model of reasoning. These systems contain an explicitly represented, symbolic model of the world, in which decisions are made via logical reasoning, based upon pattern-matching and symbolic manipulation [Jen95, Dig97].

Social Agents: Social agents focus upon the data stores and algorithms necessary to support co-operation, co-ordination, and collaboration between multiple agents. This interaction is supported through the specification of a common interaction language, known as an Agent Communication Language (ACL), in conjunction with various infrastructure components to support the transmission of messages within this language, and finally some internal modules to enable the agent to reason about the current conversations it is pursuing [Col02].

Abstract Agents: In terms of an abstract type, the Foundation for Intelligent Physical Agents (FIPA) has focused upon issues such as message transport 11

interoperability, supporting multiple forms of Agent Communication Languages, supporting various forms of content language, and supporting alternate directory service models rather than issues such as agent lifecycle representation, agent mobility, or agent domains. The aim behind this approach is to standardise the part of an agent architecture that is responsible for inter-agent communication. Doing so delivers a more transparent approach to agent interaction that supports agent interoperability between agents produced upon alternate vendor platforms [Col02].

Hybrid Agents: Hybrid agents refer to those whose constitution is a combination of two or more agent philosophies within a singular agent. These philosophies include a mobile philosophy, an interface agent philosophy, collaborative agent philosophy, etc. The key hypothesis for having hybrid agents or architectures is the belief that, for some applications, the benefits accrued from having the combination of philosophies within a singular agent are greater than the gains obtained from the same agent based entirely on a singular philosophy [Nwa96].

This section introduces some of the agent types that have been developed over the past few years. In the ever changing world of computer science, and more specifically agent theory, new agent types are created regularly. There are many other agent types that were not discussed here.

The chapter provides a foundation for chapters 3 and 4, where the concept of plug-in intelligence is introduced as well as a new agent type: The Modifiable Agent.

The next section introduces some of the typical properties (also known as characteristics) of agents.

12

2.4 Properties or characteristics of Agents

Properties of agents are characteristics that allow humans to classify agents into the types mentioned in the previous section.

Goodwin categorises agent properties by distinguishing between general agents and deliberate agents [Goo95].

2.4.1 General agent properties

General agent properties relate to those properties that all agents should exhibit. These include being successful, capable, perceptive, reactive, reflexive and goal- orientated.

Successful: An agent is successful to the extent that it accomplishes the specified task in the given environment [Goo95]. A successful agent is therefore one that accomplishes its goal(s) in a predictable manner (i.e. not by random chance) by sensing and acting upon its environment.

Capable: An agent is capable if it possesses the effectors needed to accomplish the task [Goo95]. Therefore, a capable agent is one that has the ability to accomplish its goal(s). If an agent is not capable of accomplishing its goals, then it cannot be successful.

Perceptive: An agent is perceptive if it can distinguish salient characteristics of the world that would allow it to use its effectors to achieve the task [Fra96, Goo95].

Reactive: An agent is reactive if it is able to respond sufficiently quickly to events in the world to allow it to be successful [Fra96, Goo95]

13

Reflexive: An agent is reflexive if it behaves in a stimulus-response fashion [Fra96, Goo95].

From the definitions of an agent discussed at the beginning of this chapter, an addition to Goodwin’s properties of agents could be: Goal-oriented – The agent does not simply act in response to the environment but attempts to accomplish its tasks or goals. This is also known as pro-active or purposeful.

2.4.2 Deliberate agent properties

The previous section described properties that are normally associated with reactive agents. This section describes properties that are generally associated with deliberate agents.

Predictive: An agent is predictive if its model of how the world works is sufficiently accurate to allow it to correctly predict how it can achieve the task [Goo95].

Interpretive: An agent is interpretive if it can correctly interpret its sensor readings [Goo95]. In other words, an interpretive agent is an agent that can turn data from its sensors into information.

Rational: An agent is rational if it chooses to perform commands that it predicts will achieve its goals [Goo95]. A rational agent cannot therefore make any decision based on a random guess. However, if it has more than one path that will provide the desired result, a path may be chosen at random.

Sound: An agent is sound if it is predictive, interpretive and rational, i.e. it incorporates all the properties mentioned above [Goo95]. Therefore an agent is

14

sound if it can correctly predict how to achieve its task, turn data from its sensors into information and chooses to perform a command which will achieve its goals.

2.4.3 More advanced agent properties

The following agent properties are properties that are generally tied to specific types of agents (for example, a mobile agent has a mobility property).

Adaptive: The agent adapts or modifies its state and behaviour according to new environmental conditions [Mae94].

Communicative: An agent is communicative if it communicates with other agents, perhaps including people. This is also known as socially able [Jen95].

Collaborative: The agent co-operates and negotiates with other agents to achieve its goal. This is different to communicative as with the latter, only information is passed between the agents. With collaborative, the agent may perform a task on behalf of the other agent [Dil99].

Learning: The agent adapts and changes its behaviour based on its previous experience. A learning agent will learn and adapt to changing circumstances, or changes in its environment [Dil99].

Mobile: The agent is able to transport itself from one machine to another. Flexible actions are not scripted [Whi97].

The next section describes how to classify agents using a number of parameters including the properties discussed in this section.

15

2.5 Classifying Agents

Agents may be classified using a number of parameters. The more popular method is to use the properties of the agent (reactive, learning etc) and its environment (database, network, internet) to classify it. Other factors can be the language which it was written in, the type of control mechanism (algorithmic, rule based, neural net), the tasks they perform (e-mail filtering agents).

Agents can be classified by any number of the methods described above. This produces a hierarchal classification based on set inclusion. For example, a mobile communicative agent is a subclass of mobile agents. This allows for more descriptive agent classifications.

Take for example an internet based agent that indexes content on the internet for a search engine. This agent works by downloading a resource from another server, looks for keywords in the document and then searches the contents for hyperlinks. If it finds a hyperlink, it then follows it and continues indexing. If this agent cannot decipher the contents of a document, it communicates with other agents that it is aware of to see if they can decipher the document.

This agent can be classified by the following properties: 1. It is reactive as it follows links found by its sensors 2. It is adaptive as it attempts to find a solution to deciphering a document, should it not be able to 3. It is communicative and collaborative as it works with other agents to decipher content. The agent is therefore a hybrid agent (adaptive communicative collaborative and reactive agent).

This section provides information on how to classify agents into the agent types discussed earlier in this chapter.

16

2.6 Agent Languages

This section provides information on agent languages that have been developed. It provides a background to current research into agent interoperability. Communication language standards facilitate the creation of interoperable software by decoupling implementation from interface. As long as programs and agents abide by the details of the standards, it does not matter how they are implemented [Gen94].

Inter-agent languages have been developed to allow agents to share information. This can range from sharing function or procedure calls to act upon data, to sharing experiences and strategies.

Common Object Request Broker Architecture (CORBA) and Remote Procedure Call (RPC) are examples of languages that allow agents to share objects, procedure calls and data structures. This allows for lightweight agents to be developed, as they can call upon other agents to transform data they have captured [Alek03].

The Advanced Research Projects Agency (ARPA) Knowledge Sharing Effort (KSE) is a consortium to develop conventions facilitating sharing and re-use of knowledge bases and knowledge based systems. Its goal is to define, develop, and test infrastructure and supporting technology to enable participants to build much bigger and more broadly functional systems than could be achieved working alone [Nec94].

Knowledge Query Manipulation Language (KQML) provides an abstraction of an information agent in a knowledge-based system. KQML avoids the limitations of a simple remote procedure call or relational database query and makes it easier

17

to integrate intelligent agents with simpler and more mundane information clients and servers [Fin94].

The Defense Advanced Research Projects Agency (DARPA) Agent Markup Language (DAML) program is a United States government sponsored endeavour aimed at providing the foundation for the next web evolution – the semantic web. DAML allows web pages, databases, programs, models, and sensors to be linked together by agents that use DAML to recognize the concepts they are looking for [Hen00]. DAML is being developed as an extension to the Extensible Markup Language (XML) and the Resource Description Framework (RDF).

This section describes some of the agent languages that have been developed previously. Creating a common language is key in allowing agents to communicate with other agents, thereby allowing them to use each other in providing better results.

The next section introduces a number of agent programming languages that have been developed. Most have been based on the agent languages described in this section.

2.7 Agent-Oriented Programming

This section describes a number of agent programming environments that have been developed. Most have been modelled on the agent languages which were described in the previous section. This section provides more information on the current research into agent interoperability.

Agent-Oriented Programming (AOP) is a programming paradigm that allows multiple agents to interact with one another. In this paradigm, an agent is viewed as an autonomous software entity that encapsulates a set of capabilities. Computation is realised through social interactions within a community of agents. 18

AgentSpeak(L) is an AOP language developed by Rao. It is founded upon the Procedural Reasoning System and the distributed Multi-Agent Reasoning System [Rao96].

AgentSpeak(L) is a programming language based on a restricted first-order language with events and actions. It does not explicitly represent notions of belief, desire, and intention as modal formulas; instead, these notions are applied from an external perspective by the designers of the agent [Rao96].

AgentBuilder is a commercial Agent Prototyping Environment (APE). Underpinning AgentBuilder is an Agent-Oriented Programming (AOP) language entitled the Reticular Agent Definition Language (RADL). However, AgentBuilder extends this basic language with a toolkit that supports agent development. Inter- agent communication within AgentBuilder is realised through KQML [Col02].

Agents built using AgentBuilder execute in an environment known simply as the AgentBuilder Run-Time System. The run-time system is developed using Java. This allows an AgentBuilder agent to execute on any Java virtual machine.

The AgentBuilder development environment known as the AgentBuilder Toolkit provides agent developers with a number of tools. One such tool is known as the Agency Manager. The Agency Manager monitors two or more agents (identical or otherwise), and allows the developer to see any communication that happens between these agents [Age07a].

An example of a software agent developed using AgentBuilder is P-Mail. P-Mail is a mail system that ensures privacy by never storing any message on any machine other than the sending machine or the receiving machine. It achieves this using peer to peer communication [Age07b].

19

JACK Intelligent Agents is a framework in Java for multi-agent system development. JACK was developed by Agent Oriented Software. The JACK framework supplies a high performance, lightweight implementation of the Belief- Desire-Intention (BDI) architecture, and can be extended to support different agent models [Bus99]. JACK agents are rational agents that are based in an environment which they have little to no knowledge of. The agent thus has beliefs about the world and desires to satisfy, driving it to form intentions to act, hence the BDI architecture [Bus99]. JACK encourages developers to program in Java as well as in its framework.

This section provides information on some of the programming environments available for programming agents. It shows the ongoing research into inter-agent operability in that all the programming environments use some form of agent language to allow the new agent to communicate with other agents.

2.8 Conclusion

This chapter serves as an introduction to Agent technology. It introduces the concept of an agent, describes popular agent types as well as methods for classifying agents in these types. The chapter also introduces some languages that allow for inter-agent communication as well as programming languages designed specifically for creating agents.

The chapter provides a foundation for the concept of pluggable intelligence and the need for it. It shows how research in the field of agents and agent technology has already contributed a lot to agent interoperability, and how frameworks have already been developed that allow agents to not only run on multiple platforms, but also communicate with each other.

This chapter also provides a list of Agent Orientated Programming languages that have been created to aid a programmer in the creation of agents. These

20

tools are generally based on a programming language such as Java. This not only allows execution on multiple platforms but also allows the programmer to interface the agent through code already written in that language.

The next chapter introduces the concept of pluggable intelligence for agent design. It discusses reasons for creating an agent which can modify or add to its intelligence.

21

3 Intelligence as a Plug-In for Agents

3.1 Introduction

The previous chapter discusses what an agent is and presents a few different types of agents. This chapter discusses ways in which intelligence can be implemented into agents, so that agents may choose different models of intelligence.

This chapter forms the general theme of the dissertation. It looks at reasons for embedding intelligence into agents, different methods for doing so, as well as current research into embedded intelligence.

3.2 Why Embed Intelligence?

Intelligent agents typically operate in unpredictable domains [Bra01]. For example, an agent that detects spam cannot predict which e-mail message is going to be classified as spam. It has to analyse each message in turn. Most problems to be solved by agents are non-trivial and may require non- standard solutions [Bra01]. If the environment of an agent changes to the extent that it is unable to cope with the change, the agent needs to adapt [Bra01].

An agent has choices in the way in which it adapts itself. For example, an agent may switch to a different plan or goal or learn new facts. It may also adapt or modify its internal processes. Brazier calls these agents self-modifying agents [Bra01].

22

Embedded Intelligence therefore allows an agent to adapt to changes in its environment. It allows the agent to change the way it reasons and solves problems [Bra01].

The next section investigates methods in an agent’s design process that can be altered to allow for embedded intelligence.

3.3 Methods of Embedding Intelligence

There are two stages in an agent’s design that allow for different intelligence schemes to be used: compile time and run time.

3.3.1 Compile Time

The simplest method of introducing new intelligence models for agents will be during compile time. Should the author of the agent require a different approach to the intelligence of the agent, he could simply code the new intelligence into the agent and recompile. This approach has a number of flaws including: • It assumes one has access to the source code of the agent • It assumes that one has programming knowledge and skill • It assumes that one has the time to recode the solution.

The above model does not allow for a drop in replacement for intelligence. It forces a change in the source code and therefore the current agent as a whole is replaced, thereby halting its current progress in achieving its set objectives.

Compile time does not allow for embedding intelligence and is therefore not a valid solution.

23

The next section looks at embedding intelligence during the run time of an agent.

3.3.2 Run Time

Modifying an agent’s intelligence at run time allows the agent to attempt to continue to achieve its goal without being interrupted. For example, if an agent designed to analyse e-mail messages and mark relevant messages as spam requires knowledge on how to decode pictures, it can still analyse messages that are plain text while the relevant knowledge is programmed as a module.

This section analyses three different methods for modifying intelligence at run time. These include compiling multiple intelligence schemes at design time, allowing for intelligence schemes to be loaded at run time, and designing the agent to interpret rules for achieving its goal.

3.3.2.1 Multiple Combined Intelligence Schemes

A step up from compile time intelligence would be to introduce more than one intelligence scheme into the agent during coding. This would then allow the agent to either select a different intelligence as it sees fit, or allow the end user to select a different intelligence. The obvious disadvantage with this approach is that only a limited amount of intelligence can be embedded. If new technology is discovered, then one would have to recompile the agent.

3.3.2.2 Embedding Intelligence using Shared modules

Embedding intelligence using shared modules allows the agent to select different intelligence schemes from shared module sources. An example would be to use dynamic linked library (DLL) files. This allows the agent to select an appropriate 24

intelligence scheme as well as letting the end user load a different model. This method overcomes the problem of a limited amount of intelligence models as any number of DLL files can be created and loaded into the agent either at run time or at startup. This also overcomes the problem of new technology emerging as can be seen in figure 3.1.

Figure 3.1 – Different schemes for an Intelligence Chooser

The ‘intelligence chooser’ need not be embedded into the agent’s code. It can also be a shared resource. This then allows for easy upgrades should the component become dated. It is also possible to have a hierarchical structure for the intelligence chooser as is illustrated in figure 3.1b.

25

The advantage of this method of embedding intelligence is that anyone who has programming knowledge and access to a programming language can create new models of intelligence for the agent. The disadvantage is of course that the end user needs programming knowledge and skill. Another disadvantage is that the shared module files will have to follow an exact specification, otherwise the core agent will not be able to load the appropriate data.

3.3.2.3 Rule Based Intelligence

The optimum way of embedding intelligence is to allow the agent to interpret rules. This allows for complex structures such as figure 3.1b as well as allowing anyone to be able to add new intelligence, as it eliminates the need to have a compiler and to understand the programming language associated with that compiler.

Grosof writes that “Rules and reasoning are more accessible to users, especially non-technical users, than scripting or macro languages, yet more powerful than menus and direct manipulation in terms of the complexity of behaviour that they can specify.” [Gro95]

The success of such an application, however, depends solely on the capabilities of the agent that is interpreting the rules. If a rule based agent cannot efficiently handle things such as conflicts in its rule set, then it will exhibit unpredictable behaviour.

In 1994, IBM started development on a system called RAISE (Reusable Agent Intelligence Software Environment). It was built in different phases, the first of which gave RAISE the capability of rule based inference, user authoring of rule bases, integration with external software components and basic support for inter- agent knowledge-level communication [IBM05a].

26

IBM describes RAISE by comparing it to human characteristics: The brain provides reasoning and learning, particularly yes-no rules. These rules implement beliefs that are either true or false, and not those that have a degree of truth in the manner of probabilities or fuzzy logic [IBM05a].

The body situates the brain in a larger environment. RAISE allows sensors and effectors to be procedural attachments. These can be dynamically registered as pluggable adapter components. “Linkages to sensors and effectors are treated as a syntactic and semantic extension of the pure-belief knowledge representation” [IBM05a].

Society refers to inter-agent communication. This includes support for exchanging rules and facts. RAISE uses the ARPA Knowledge Sharing Effort approach. This also provides modes of inter-agent knowledge-level communication [IBM05a].

Human-agent interaction refers to the user interface. This includes user created rule sets. Rules are simpler for an end user to understand and help clear confusion as to why an agent is acting in a certain manner. Rules are also simpler to use than a high level programming language [IBM05a].

RAISE includes a rule editor that allows a user to create rules or modify a set of the agent’s rules. This can also be accomplished during the run-time of the agent [IBM05a].

IBM notes that rules are particularly useful in search and retrieval techniques, specifically in free-text information retrievals and natural language analysis. They further mention that rule based reasoning is particularly useful in filtering actions based both on structured information (for example mail headers) as well as unstructured (text bodies) [IBM05b]. 27

The algorithm which RAISE uses to parse rules permits the use of extended form rules (these include keywords such as “and” and “or”). These rules are then transformed into Horn-form rules (a Horn-form rule permits the use of the keyword “and” but not the keyword “or”) [IBM05a].

RAISE was discontinued in 1997. RAISE was one of the first agents to be built based on rules.

IBM has since created CommonRules. IBM describes CommonRules as “a rule- based framework for developing rule-based applications with major emphasis on maximum separation of business logic and data, conflict handling and interoperability of rules” [IBM05b]. CommonRules was developed as a Java library which allows for rapid development of rule-based applications.

The engine which interprets the rules for CommonRules is called Courteous Logic Program (CLP). This engine allows conflict resolution through mutual exclusion and prioritized override [IBM05b].

An e-commerce web site that was developed using CommonRules can communicate its business policy rules via XML to a customer application or agent, even if the site’s rules are implemented using a different rule system to that of the buyer. This allows the customer application or agent to automatically execute those rules to make plans or decisions [IBM05b].

Like RAISE, CommonRules enables non-programmer business-domain experts such as marketing managers to modify rules at run-time.

As stated, CommonRules uses its own rule interchange format which is an extension to XML. This is called Business Rules Markup Language (BRML). The rule exchange is, however, not limited to XML. It may be exchanged in a number 28

of ways such as directly via Java objects, through XML or other string formats. The rule based programs need not be programmed in Java.

The next section looks at a different type of agent framework, called the Agent Academy.

3.3.2.4 The Agent Academy

The Agent Academy was a project started in November 2001. Its main goal was to create a framework for embedding intelligence into agents through the use of data mining techniques.

The European Co-Ordination Action for Agent Based Computing believes that Intelligent Agent technology integrated with Data Mining and Knowledge Discovery will dramatically affect the way humans interact with computers [Age05, Ath03].

The Agent Academy was developed to create an integrated environment for embedding intelligence in newly created agents through the use of Data Mining techniques [Age05].

Agent Academy forms an integrated framework that receives input from its users and the Web. A user issues a request for a new agent as a set of functional specifications. The Agent Factory, a module responsible for selecting the most appropriate agent type and supplying the base code for it, handles the request. A newly created untrained agent (UA) comprises a minimal degree of intelligence, defined by the software designer.

This agent enters the Agent-Training Module, where its world perception increases substantially during a virtual interactive session with an agent master (AM). Based on the encapsulated knowledge, acquired in the knowledge 29

extraction phase, an AM can take part in long agent-to-agent (A2A) transactions with the UA. This process may include modifications in the agent’s decision path traversal and application of augmented adaptivity in real transaction environments [Age05].

Figure 3.2 – The Agent Academy Agent Creation Process [Sym02]

Figure 3.2 shows the Agent Academy training process.

The Agent Factory is used to create new agents on demand. The user creates an agent via a supplied user interface. The Agent Factory uses JADE to provide compatibility between platforms. JADE also allows for the creation of agents with special characteristics

The core of the Agent Academy is the Agent Use Repository (AUR), which is a collection of statistical data on prior agent behaviour and experience. It is on the contents of AUR, where data mining techniques, such as extraction of association rules for the decision making process, are applied in order to augment the intelligence of the AM in the training module. 30

Building AUR will be a continuous process performed by a large number of mobile agents and controlled by the Data Acquisition Module. A large part of an agent’s intelligence handles the knowledge acquired by the agent since the beginning of its social life through the interaction with the environment it acts upon [Age05].

3.4 Conclusion

This chapter introduces the concept of embedding intelligence into agents. It discusses existing methods in which intelligence can be embedded into an agent, and during which parts of the agent creation process, intelligence can be implemented.

The ideal way for implementing intelligence is a rule based approach. This allows an agent to follow a set of rules which can be changed at any time (either by the agent or someone (or something) controlling the agent. The drawback of this approach is that it is very difficult to implement.

This chapter examines architectures and frameworks that have been developed over the past 17 years to create an environment for agents that base their intelligence on rules. RAISE by IBM was examined and an example was given of where it could be used.

The Agent Academy, a framework for creating intelligent agents, is discussed. This framework seeks to create intelligent agents by integrating agents through data mining techniques.

31

The next chapter seeks to create a new agent type, namely the Modifiable Agent type. It looks at what characteristics make up an agent that can modify its intelligence.

32

4 The Modifiable Agent Type

4.1 Introduction

Agent types are discussed in chapter two, although, trying to fit every single agent produced into these limited categories is no easy task. Therefore, designers often place their agents into two or even three different categories. This chapter looks at reasons for creating agent type definitions and introduces a new agent type, namely the Modifiable Agent.

This chapter builds on the concepts that are described in chapter three, namely those of embedding intelligence. It analyses the different characteristics that make up a modifiable agent, provides both advantages and disadvantages of such an agent, and finally compares a modifiable agent to other more popular agent types.

4.2 Why Create Agent Type Definitions?

Software agents are always evolving, becoming more and more complex. Fixed boundaries are crossed regularly, and it becomes harder and harder to classify agents. Without the ability to classify agents, one would struggle to find an agent that performs a specific task.

As processing power grows and becomes more readily available, software agents evolve to take advantage of the greater power available to them. This causes confusion when classifying agents, as more and more characteristics of agents are incorporated into other type definitions.

33

Creating new agent types provides the ability to keep up with the agent evolution and allows new agents to be correctly classified and stored. This allows for faster retrieval of already created agents and therefore helps eliminate data replication.

4.3 Description of a Modifiable Agent

A Modifiable Agent is an agent that has the ability to change part or all of its internal processes during runtime.

As the description suggests, the agent need not be able to change its intelligence model. It can change the way it perceives the environment (sensors) or how it interacts with its environment (actuators).

A modifiable agent that modifies its intelligence need not be able to learn. As discussed in chapter 2, such an agent type already exists, and it is possible to describe an agent using multiple types.

4.4 Why create an agent that can modify itself?

Software designers are always seeking to create software that will make every day processes that much simpler. For this reason, new technologies are being created every day. These technologies can quickly cause current software in place to become redundant and force developers to spend time replacing them.

Agents generally operate in environments with many variables, making them difficult to predict. Add to that the fact that problems these agents have to solve are non-trivial and require non-standard methods to solve. If an agent finds itself in a situation where it is unable to cope with changes in the environment, or its context as a whole, then the agent needs to adapt [Bra01].

34

Giving agent software the ability to modify itself, inherently extends the life of the agent. The agent is easily able to adapt to changing technologies, and this will help the agent keep up to date by fixing any bugs that arise in its original design.

New technology is not only limited to software. New hardware devices emerge on a regular basis. At the time of writing, much hype surrounded the Apple IPod. An agent that existed to perform some action with music/video or image libraries would soon be replaced if it did not pick up the trend and support the IPod.

Technologies currently exist that allow objects running on remote machines to be accessed by an agent. This allows an agent to plug in intelligence schemes running on remote machines as it sees fit, without burdening the processor on which it is running.

4.5 Disadvantages of the Modifiable Agent

The biggest disadvantage to using the modifiable agent type model is the increased processing power required. For an agent to be able to modify its intelligence, requires it to have some built in code that allows it to recognise and use other intelligence schemes. On start-up, the agent needs to identify an intelligence scheme that it can use and load it. This can mean long wait times if there are many intelligence schemes available to the agent and the agent is trying to identify the best one to use.

The next problem comes from the intelligence schemes themselves. Loading an intelligence scheme that was not inherently built into the agent, means that the agent is loading some external module, for example a DLL or an object file. Transferring data between the agent and this intelligence scheme can be a slow

35

process, as the operating environment may have many checks in place to make sure that the data is transferred safely.

The last problem deals with the inherent complexity. As the intelligence schemes are all externally compiled, it can be very difficult to debug. This can make it difficult to find small problems such as an off by one error.

The next section deals with self aware agents. A self-aware agent can be a useful extension to a modifiable agent as discussed later.

4.6 Self-Aware Agents

As defined in chapter two, automation is a key part of an agent’s design. If an agent acts upon its environment without direct intervention from humans, it means that the agent has some method of sensing its environment, thereby making it self-aware.

Brazier and Wijngaards provide key characteristics for building a self-aware agent [Bra02]:

• Self-awareness knowledge: the agent must have knowledge that describes its functionality and behaviour, as well as nonfunctional characteristics of the agent.

• Monitoring information: the agent must be able to monitor its functioning and behaviour in a given situation.(e.g. when it wishes to be adapted and re-activated),

• Self-assessment knowledge: the agent must have knowledge that determines the extent to which it may function in a given situation

36

• Need formulation knowledge: the agent must have knowledge with which needs for adaptation can be determined

• Integration: self-awareness knowledge, monitoring information, self- assessment knowledge, and need formulation knowledge must be integrated in the (internal) functioning of a self-aware agent.

• Communication: the agent must have knowledge of the way in which it can choose an agent factory, and interact with an agent factory (e.g. protocols and languages for either an intermediary or direct interaction with the factory).

The self-aware agent can add a new dimension to the modifiable agent.

4.7 The Self-Aware Modifiable Agent

The Self-Aware Modifiable (SAM) agent presents an interesting paradigm. The SAM agent is an agent that can modify its intelligence as it sees fit, i.e. depending on information received from its sensors.

A SAM agent inherits the same disadvantages as a Modifiable Agent. The biggest problem being that of how an agent decides on which intelligence scheme would best suit its current environment. Other problems can also present themselves, such as how the agent would decide to use a different intelligence scheme should its environment change and how the agent would go about preventing continuous swapping of its intelligence schemes. These problems are solved by using an Intelligence Chooser as described in section 3.3.2.2. The Intelligence Chooser would need to be rather complex and have the ability of using multiple intelligence schemes simultaneously to reduce the cost of swapping out intelligence schemes.

37

4.8 Conclusion

This chapter introduces the concept of a modifiable agent type. Agent types are a necessity in describing agents and allow for easy categorization.

The modifiable agent type describes an agent that can change its intelligence. This can be seen as an important step in software engineering as it allows an agent’s base to stay the same while the intelligence scheme is continually improved upon.

The two biggest disadvantages in using the modifiable agent type is the need for extra processing power and the complexity involved in designing and testing intelligence schemes. This must be carefully weighed against the advantages of using a modifiable agent before this agent type is chosen.

The self-aware modifiable agent is an agent that can automatically modify its intelligence depending on information received from its sensors. Such an agent would be complemented by a repository that would store different intelligence schemes. Should a SAM agent be faced with an environment which it deems it would not be able to successfully operate, it would be able to search the repository for an intelligence scheme which it feels would work best in its environment.

Chapter four serves to provide the ground work for creating agents that have the ability to modify their internal processes at runtime, whether automatically or by human intervention. It introduces the modifiable agent type and an extension to it, namely the self-aware modifiable agent.

38

The next chapter introduces digital music. The agent created to demonstrate the modifiable agent type is based in a digital music environment.

39

5 Background to Digital Music

5.1 Introduction

This chapter introduces digital music. The agent developed to prove the concept of pluggable intelligence is based in a digital music environment. This chapter, therefore, provides an introduction to the environment of the agent discussed in chapters 8 onwards. It provides a brief history of digital music and its impact on the recording industry. It introduces technologies such as storage mediums, compression techniques and portable devices that have revolutionised the music industry.

5.2 History of Digital Music

Before Compact Discs, all music was produced and distributed on vinyl discs and magnetically encoded tape. Compact-disc technology which swept the consumer market during the late 1980s and early 1990s was the first element in the industry’s shift to digital technology [Alex02].

As a result of this shift, consumers were able to play compact-discs on their personal computers, as well as transfer the songs from the compact discs for storage on their computers [Alex02].

However, the files were generally not shared with large numbers of other users, since the transfer of three minutes of music required fifty megabytes of hard drive storage space and an enormous amount of time and bandwidth to transfer them across the internet.

40

Development of the MP3 file format dramatically changed these storage and bandwidth requirements. MP3, created by engineers at the German company Fraunhofer Gesellshaft, is an audio compression format that generates near compact disk quality sound at approximately 1/10 to 1/20 the size [Alex02].

MP3 files allow for a single song to be transferred in approximately 5 minutes along a 56 kilobit per second modem [Alex02]. If the same song were transferred along the same channel in an uncompressed state, it would take approximately one and a half hours. Taking into consideration high speed internet connections such as Asynchronous Digital Subscriber Lines (ADSL), an entire album can be downloaded in a matter of minutes.

5.3 Growth of Digital Music

Over the past few years, physical sales of music have been decreasing. However, digital sales worldwide increased by 746 million US dollars in 2005 alone [IFP05, IFP04]. Table 5.1 shows Trade Revenues by format worldwide. The Recording Industry Association of America (RIAA) report similar figures, with a net percent change in downloaded albums of 198.5% [Ria06]. Table 5.2 shows the 2005 year end statistics for downloaded music. Neither the IFPI nor the RIAA recorded figures for downloaded music before 2004.

Apple’s iTunes, which has grown in popularity since the launch of the Apple iPod, increased its downloader subscriber base from 861,000 in July 2003 to 4.9 million in March 2004 (an increase of 570%) [Bor04].

Due to Apple’s success, other large players such as Sony, Microsoft, Virgin and Yahoo are all making plans to enter the market [Bor04].

41

Microsoft is planning to launch its own portable media player before the end of 2006. This device, known as the Zune, will be direct competition for Apple’s iPod, which has sold more than 58 million devices [Fri06].

Table 5.1 – Trade Revenues by Format (million $US) [IFP05]

Table 5.2 - 2005 Year End Statistics (in million US$ net after returns) [RIA05]

42

The move to digital music coupled with the internet provides new avenues for up and coming artists to explore. The next section looks at how the music industry is changing due to the impact of digital music.

5.4 The Impact of digital music

Currently the composer and recording artist rarely receive more than 16% of the purchase price of a compact disc. The rest goes to the manufacturer of the disc itself, the distributor of the disc and the retail store or the record company that produced the recording [Fis00]. If artists chose to distribute the music over the Internet by themselves, almost all costs associated with making and distributing discs could be eliminated. The result would be that musicians could earn more or consumers could pay less or both. There are already a number of services that have started that allow artists to do just that. An example of such a service is ArtistLed [Art06].

There are a number of advantages to Internet distribution. For example, it eliminates disappointments such as when a CD is out of stock. It also allows consumers to choose exactly the music they want and none of the music they don’t want.

The distribution of digital music via the Internet does have a substantial drawback. It undermines the ability of music creators to earn money. As MP3 files are insecure, nothing prevents a person from making an unlimited number of copies and, unlike copies made using analog technology (such as cassette tape recorders), the copies made using digital technology are identical to the original. The result is perfect MP3 copies of copyrighted recordings widely available on the Internet for free. This has sparked legal campaigns from companies such as the Recording Industry Association of America (RIAA) “against individuals

43

engaged in no permissive downloading of copyrighted MP3 files, the manufacturers of the machines used to play MP3 files, the operators of "pirate" Web sites and against the growing group of intermediaries that assist users in locating and obtaining MP3 files.” [Fis00]

The next section looks at popular digital music compression formats, such as MP3 and Ogg Vorbis, currently in use.

5.5 Current Digital Music Technologies

5.5.1 Introduction

As discussed earlier in this chapter, the digital revolution was largely due to the introduction of Compact Discs, and later the MP3 compression format. This sparked a number of new formats, all trying to capture a share of the market away from MP3. This section investigates some of the more popular formats, their designers and the advantages of using such formats.

5.5.2 Pulse Code Modulation (PCM)

Pulse Code Modulation (PCM) can simply be described as uncompressed sound stored in a digital format. This section introduces the technology.

5.5.2.1 Introduction

The standard number format for sampled audio signals is officially called Pulse Code Modulation (PCM). This term simply means that each signal sample is interpreted as a pulse (for example a voltage or current pulse) at a particular 44

amplitude, which is then binary encoded. Most mainstream computer sound file formats consist of a header (containing the length, etc.) followed by 16-bit two's- complement PCM. [Smi03]

PCM is uncompressed audio. PCM, or (as it is more commonly known) (or wave) audio was one of the first audio standards. It takes a large amount of disk space to store song data in wav format (approximately 10MB per minute).

5.5.2.2 Tagging Options

PCM audio has no standard method of tagging files. Most are recognised simply by their filename, but this is very limited in what it can store (and even more so if a filename length limit is imposed). Programs have been known to add text data to the end of the file, in the hope that other programs abide by the length set out in the header. However, this is not the standard and some programs play a squelch at the end of the recording when playing back the file.

5.5.3 MPEG 1 Audio Layer 3 (MP3)

MP3 was the first mainstream audio compression format. It caused great controversy when services such as Napster started up, even prompting the Recording Industry Association of America (RIAA) to file a lawsuit and actively pursue their termination [Ria06].

5.5.3.1 Introduction

MP3 was first started in 1987 by the Fraunhofer Institute for Integrated Systems (IIS) in a joint cooperation with the University of Erlangen. The Fraunhofer IIS

45

finally devised a very powerful algorithm that is standardized as ISO-MPEG Audio Layer-3 [Fra06].

MP3 is a lossy data compression method (meaning that compressing a file and then decompressing it retrieves a file that may well be different to the original, but is "close enough") to store good quality audio into small files by using psychoacoustics in order to get rid of the data from the audio that most humans can't hear [Aut04, Dis04]. MP3 files are generally encoded at a set bit rate (though it is possible to encode using a variable bit rate scheme). This means that each frame may only use a set amount of storage space to store the sound sample. MP3 loses a lot of quality at low bit rates, as it simply does not have enough space to compress data.

A recent advancement to the MP3 format is MP3PRO. MP3PRO was created as to solve the problem of MP3 files with limited bandwidth. An enhancement technology that protects high frequency components was developed to improve the sound quality of MP3 at lower bit rates [Aut04].

The technology is known as Spectral Band Replication (SBR). SBR is a very efficient method to generate the high frequency components of an audio signal. The audio format is then composed out of two parts, namely the MP3 part for low frequencies, and the SBR part for the high frequencies. The SBR part is also known as the PRO part, hence MP3PRO [Aut04].

MP3PRO is backward compatible with MP3 and also allows for multiple-channel encoding.

The MP3 format is the most widely used format on the internet. However, as it has no means of digital rights management (DRM), it is losing ground to formats such as Microsoft’s WMA and AAC (used by the Apple iPod). DRM is software

46

that protects music files from being copied illegally. This is usually done by placing restrictions on how it is used.

5.5.3.2 Tagging Options

The original tag for MP3’s (called ID3) was a 128 byte string situated at the end of the file that contained the title, artist, album, year, genre (represented by an integer) and a comment field. The comment field was later shortened by two bytes to allow for the track number to be stored as well. This tag had some very obvious drawbacks: Firstly, the artist and title were fixed length strings, so very long names could not be represented. Secondly, if the file was being streamed, the tag would be the last thing to arrive. [Nil05]

ID3 version 2 was developed to overcome the problems presented with ID3 version 1. Firstly the data is stored at the beginning of the song. Secondly, it allows for a number of different items with very large data portions. Nilsson writes that one of the design goals of ID3 version 2 was that the tag should be very flexible and expandable [Nil05]. As the tag was designed similarly to HTML, it is very easy to add new functions. Any information that a parser does not understand or doesn’t recognise will just be ignored by the parser [Nil05].

ID3 version 2 limits each frame to 16 megabytes, with the entire tag not longer than 256 megabytes [Nil05]. The large size means that you can store a large amount of information, including images such as the album cover that the song appeared on.

ID3 version 2 has set fields for storing certain information such as the artist name, the song title, the album cover and even arbitrary information such as the songwriter and the average beats per minute. However, even though the list is

47

quite extensive, it is not limited. As noted above, any number of fields can be added as a parser that doesn’t recognise those fields will simply ignore them.

5.5.4 Ogg Vorbis (Ogg)

5.5.4.1 Introduction

Ogg Vorbis is a relatively new format. It was developed by Xiph foundation as an open source rival to MP3 and WMA. The main motivating reason for developing Ogg Vorbis had to do with patents. Ogg Vorbis is fully open source, and, unlike other audio compression formats, it does not require any licence fees. Ammoura and Carlacci write that Ogg Vorbis is fully open source and non- proprietary. It is also patent and royalty free software. They continue to say that Ogg Vorbis is a general-purpose compressed audio format for mid to high quality audio and music. [Amm02]

Ogg Vorbis is sometimes referred to simply as Ogg, although this is technically incorrect as Ogg is a container format while Vorbis is an audio codec.

Development on Ogg Vorbis started in September 1998 after Fraunhofer Gesellschaft announced plans to charge licensing fees for the MP3 format. The first stable release of the codec was released on the 19th of July 2002 [Amm02].

Ogg Vorbis commands a much smaller subscriber base to that of MP3. However, as there are no licence costs involved in using Ogg Vorbis, it is growing in popularity. Large computer game manufacturers such as Epic and EA Games are starting to use Ogg Vorbis as their compression format. Even some radio stations, like Virgin Radio, have started using Ogg Vorbis [Ogg03].

48

5.5.4.2 Tagging

Ogg Vorbis tags, called comments, support tags similar to those implemented in the ID3 tag standard for MP3. is stored in a vector of eight bit strings of arbitrary length and size. The size of the vector and the size of each string is limited to 232-1 bytes (approximately 4 gigabytes). This vector is stored in the second header packet that begins an Ogg Vorbis bitstream [Ogg01].

Tags are generally implemented in the form of “[TAG]=[VALUE]”, for example, “ARTIST=Mozart”. There is no strict field definition, so users and encoding software may use whichever tags are appropriate. This may include tags such as Venue. Unlike the ID3 format, Ogg Vorbis comments allow for multiple tag definitions. This allows a song to be tagged correctly should it span multiple genres or be contained on multiple albums [Ogg01].

5.5.5 Microsoft Windows Audio (WMA)

5.5.5.1 Introduction

Microsoft originally designed (WMA) to be a competitor to MP3 format. However, with the introduction of Apple’s iTunes, it was repositioned to compete against the Advanced Audio Coding (AAC) format used by Apple [Pas06].

WMA is a proprietary compressed . It has a large user base through Windows, as it forms part of Microsoft’s Windows Media Framework [Aut04] 49

While the Windows Media Audio format itself does not allow for digital rights management facilities, the container format the Advanced Systems Format (ASF) does. The DRM technology supports time-limited music such as those offered by unlimited download services, such as Napster and Virgin Digital.

DRM is a big selling point for artists and record labels as it prevents digital music from being illegal distributed. Microsoft states that its Windows Media Format offers an integrated digital rights management system that allows content providers and retailers a flexible format for the secure distribution of digital music and video [Mic05].

Napster uses WMA to distribute music files legally. They allow files to be played on up to three computers, burned to compact disc and transferred to a portable music player [Son04].

5.5.6 MPEG-2 AAC

5.5.6.1 Introduction

AAC (Advanced Audio Coding) was developed by the MPEG group that includes Dolby, Fraunhofer (FhG), AT&T, Sony and Nokia [App06a]. AAC, also known as MPEG-2 AAC, is a lossy data compression scheme intended for audio streams. It was developed in 1999 as a replacement for MP3 and to solve the problem that MP3 had at low bit rates. Unlike older MPEG audio encoding methods, MPEG-2 AAC is not backwards compatible to older MPEG audio formats. For example, MP3 is backwards compatible to MP2 [Aut04].

50

There are no licence fees required to stream or distribute content in AAC format. This made AAC a more attractive format to MP3 (which requires royalty payments), especially for streaming content such as public radio [Via06].

AAC is the format used by Apple’s iTunes store. All music bought from this store must either be played using Apple iTunes or a portable device such as an Apple iPod [Mar05].

AAC was officially declared an international standard by the Moving Pictures Experts Group (MPEG) in April 1997.

Apple introduced its own DRM software into AAC, known as Fairplay DRM. It allows users to play their purchases on five computers using iTunes. Any playlist of purchased songs can be burned to compact disc seven times [App06b].

5.5.6.2 Tagging

MPEG-2 AAC does not have an official tagging standard. Apple’s ITunes uses its own format for storing tags in AAC files (storing the information in an atom), however, Apple has not made their format public knowledge.

5.6 Conclusion

This chapter introduces the subject of digital music and the history of how digital music came to be the standard used today. It introduces some of the more common audio compression formats and looks at which manufacturers are using the formats and for what purpose. It also introduces the subject of digital rights management and how manufacturers are imposing restrictions on the use of digital music to curb piracy.

51

Digital music and in particular MP3 files coupled with the internet are changing the way music is being distributed. It allows artists who cannot seal recording contracts to publish their music and provide it to the entire world. Digital music is also set to cut the cost involved in buying music, as the consumer has the ability to choose which songs he/she wants to purchase instead of being forced to buy the entire album.

The general consensus behind audio encoding is to take any wave file or a PCM (Pulse Code Modulation) source, and produce an encoding that compresses data and maintains a reasonable audible quality [Amm02].

Figure 5.1 - Ratings of different sound encoding [Fre03]

Figure 5.1 shows a comparison made by the European Broadcasting Union MUSHRA. It shows the quality of the audio reproduced by different encodings once the sound was decoded and played back [Fre03].

52

As all the above formats compress wave data to a fraction of its original size, it would be a mistake to only support such a format. Its lack of any form of tagging makes it a poor candidate. That being said, decoding any of the above formats ultimately leaves you with wave data, so supporting any of the formats above inherently means you support wave files as well.

Out of all the formats, MP3 is still the most commonly used and has an excellent tagging system. Ogg Vorbis is still a new format and is slowly making its way into the market. Similar to MP3, it also has a solid tagging system.

Figure 5.2 - Comparison of average file size [Bob06]

Figure 5.2 shows a comparison of the average file size using the different encoding algorithms.

AAC is arguably the new market leader with regards to digital music as Apple’s iTunes increase sales of songs and its iPods on a daily basis. Microsoft’s Zune may provide more support for WMA, but this is yet to be seen [Bor05].

This chapter provides a foundation for the environment in which the agent discussed later in this dissertation, will be based. The agent will have to deal with music files, generally encoded in one of the formats presented in this chapter. This chapter also introduces the subject of digital rights management used by

53

online retail stores to protect their music files from being copied. However, this can also prevent the agent from decoding the song and therefore, correctly profiling it.

The next chapter investigates current agents that work in the area of digital music.

54

6 Current Agents in Digital Music

6.1 Introduction

There have been many software (and hardware) systems built to aid in music recognition. From simple systems such as a mechanical foot that taps in time to a beat [Des94] to advanced algorithms that cluster music into certain categories. This chapter looks at a few agents and software packages that have been created to act upon a digital music environment.

6.2 MusicBrainz and Picard

MusicBrainz is an automated tagging tool for MP3 audio files. The tool creates a hash of the MP3 music file and then compares it to submissions made by people around the world. Once a match has been made, MusicBrainz can update the metadata of the MP3 file (ID3 tag).

MusicBrainz intelligence lies in its matching ability. It reports a match percentage (for example, 90% match), however, it is up to the user to make sure the match is correct. MusicBrainz relies on the input from users to expand its database. For example, if a new album is released, the first person to try and match the songs will not get any matches, and will therefore have to tell MusicBrainz what the album name is and which songs appear on the album. This information is then sent back to the MusicBrainz server so that the next person to try will get a successful hit [Mus06].

Picard is the next generation of MusicBrainz. It is slightly more intelligent than MusicBrainz as it will search sources such as the Compact Disc Database (CDDB). A nearly unique identifier is created from the amount of tracks on the 55

album, the total play length as well as a checksum which is generated from the individual tracks on the album. This identifier is then sent to the CDDB database which then returns a list of all songs as well as the artists for the tracks.

Should the original album not be available, Picard will try other means such as matching the ID3 tags that are already contained in the MP3 files.

Figure 6.1 - Picard profiling an album [Mus07]

Figure 6.1 shows Picard profiling the self titled album by Daughtry. It shows that two songs were high matches (the green bars next to songs 1,10 and 12). The other matches were between 70% and 80% , represented by the orange bars next to the song. These matches are generally correct, however Picard warns the user by placing a reddish background behind the song. The worse the match, the darker this line is [Mus07].

56

MusicBrainz and Picard fingerprinting methods provide insight into uniquely identifying music regardless of its location. This allows for a music file to be profiled by one source and allow another source to use that profiled information without attaching a location or filename to the song that was profiled.

6.3 MoodLogic

Moodlogic, developed by Moodlogic Inc creates a playlist of music which has a similar mood. It works similarly to MusicBrainz and Picard in that it generates a fingerprint of a song and then gets profiled information from a central database. This information is also community driven, and is gained from asking the user questions on how a certain song affects the user's mood, and then asking the user to rank this on a low to high scale.

Once enough songs had been profiled, a user could select a song and then generate a playlist of similar songs. MoodLogic would then select songs which had a similar ranking to that of the selected song and display this list to the user. Unfortunately due to the lack of version updates (the last version was released on November 13, 2003) and database updates the project seems to have been discontinued. [Moo05]

6.4 FixTunes

FixTunes is another tag correcting utility. It, like MusicBrainz and Picard, creates a hash of the music file and downloads the appropriate metadata. It is also used to organize music into specific folders, and to provide updated information such as album art. Unlike that of MusicBrainz and Picard, FixTunes’ database is not community driven. The software is, however, proprietary and requires a fee to use.

57

6.5 Relatable

Relatable is fingerprinting software for music. It creates a TRM ID that can be used to uniquely identify music. TRM is a recursive acronym for TRM recognises Music.

TRM can be described as an audio fingerprinting technology. It attempts to generate a unique fingerprint for an audio file using an analysis of the acoustic properties of the audio itself. Creating a unique fingerprint for audio files allows a track to be identified. TRM does not rely on text identifiers, and therefore ignores any associated tag [Rel05].

TRM interprets audio information that humans actually hear, instead of working on text identifiers such as ID3 tags described in the previous chapter [Rel05].

TRM works by extracting a large number of acoustic features from an audio file. From these features, it creates the audio file’s unique fingerprint. Each fingerprint is different and identifies the specific musical track precisely. Once the fingerprint is created, it is sent to the TRM server, which matches the fingerprint to an existing song in a customer's music database [Rel05].

MusicBrainz uses TRM fingerprinting to identify songs.

6.6 MusicMagic and MusicIP

Predixis MusicMagic is a good example of software controlled moods. MusicMagic is a closed source commercial project, so little information behind the workings of the software is available. No information is available as to how the software generates a fingerprint, but it seems to be able to do so at a decent speed. It is therefore assumed that only a few techniques are employed (possibly a Discrete Wavelet Transform). Although this software generates very good

58

mixtures of songs, there does seem to be room for improvement. The songs seem to be in part limited to a specific genre. The program seems to allow for saving of specific moods (Figure 6.2), though this functionality is disabled for the free version [Pre05].

Figure 6.2 - The Mood Option for MusicMagic [Pre05]

The manual for the program gives more insight into this feature. Each mood that was previously defined is added to the Moods Menu. Selecting one of these moods creates a play list of songs which reflects the chosen mood.

MusicMagic has the option of saving a mood. By selecting it, the program saves all songs in the current play list which have been correctly profiled. It then prompts for the name of the mood and saves this name to the Moods Menu [Pre05].

Figure 6.3 shows the output of a mix generated on the song “Good Riddance” by Greenday. It demonstrates the difficulty involved for an agent to identify moods. For example, the song chosen for the mix (Good Riddance) is a rock ballad. Music selected in this mix includes a metal band Nightwish, comedy by Weird Al Yankovic and classical music by Andrea Bocelli.

Predixis MusicMagic is no longer available for download and has been replaced by MusicIP.

59

Figure 6.3 - Predixis MusicMagic Mixer [Pre05]

MusicIP is an upgrade to MusicMagic that also includes music analysis features that allow it to recognise musical attributes that are meaningful to humans. This allows it to classify songs that are perceivably similar [Pre07].

60

6.7 Beat Tracking System

Goto and Muraoka describe a system known as the Beat Tracking System (BTS). This system is designed to recognise beats in acoustic music signals in real time [Got99]. Detecting a beat in a music signal is a difficult task. Some difficulties faced include: • Music signals consist of many instruments, so uniquely identifying a single instrument is difficult. • Beats don’t necessarily have to correspond to a real sound. It is a perceptual concept that humans feel in music. • Detecting whether or not a beat is a strong beat or a weak beat can also prove difficult. Goto and Muraoka’s BTS addresses the above issues by analysing multiple possibilities of beats in parallel. BTS then uses multiple agents, each to predict where the next beat is. The position of the next beat selected, is chosen from the most reliable agent [Got99].

6.8 Conclusion

The number of agents and software systems that have been created to function within a digital music environment are mostly focused on uniquely identifying music and accurately identifying the beat of a song. This must not be overlooked as it provides a means for recognizing a song even if its location or filename changes.

Music is, however, a field which continues to grow and more and more people are moving towards digital music (see chapter 5.3 for details and statistics). Despite this, research into digital music is rather limited and does not show any signs of changing.

61

The next chapter introduces techniques which can be used to identify music and allows agent software to group similar songs.

62

7 Audio Signal Classification

7.1 Introduction

This chapter discusses audio signal classification techniques which can be used to classify music into categories. The Enhanced Music Mapping Agent discussed in later chapters will use several of these techniques to classify music files.

Audio signal classification (ASC) consists of extracting relevant features from a sound, and of using these features to identify into which of a set of classes the sound is most likely to fit [Ger03a].

Gerhard suggests four steps to take when doing audio signal classification [Ger03a]: 1. Data Reduction (Features) 2. Clustering 3. Analysis Duration 4. Classification Depth

Before these steps are analysed, a few key concepts are introduced.

7.2 Beat Tracking

Much music has as its rhythmic basis a series of pulses, spaced approximately equally in time, relative to which the timing of all musical events can be described. This phenomenon is called the beat, and the individual pulses are also called beats. Human subjects are capable of finding the beat with minimal musical training; clapping or foot-tapping in time with a piece of music is not considered a remarkable skill [Dix01a]. However, as with many primitive tasks of which humans are capable with apparently little cognitive effort, attempts to

63

model human behaviour algorithmically and reproduce it in computer software have met with limited success [Dix01b, Dix01c].

Beat tracking gives an indication to the rhythm and tempo of a song. Knowing whether a song is up-beat or whether it has a slow tempo helps in classifying the song [Dix02].

7.3 Fourier analysis

Humans and other vertebrates have an organ called the cochlea inside the ear that analyzes sound by spreading it out into its component sinusoids. One end of this organ is sensitive to low frequency sinusoids while the other end is sensitive to higher frequencies. When a sound arrives, different parts of the organ react to the different frequencies that are present in the sound, generating nerve impulses which the brain then interprets. Fourier analysis is a mathematical way to perform this function. Fourier analysis consists of decomposing a function in terms of a sum of sinusoidal basis functions. These sinusoidal basis functions can be recombined to obtain the original function. The Fourier representation of a signal shows the spectral composition of the signal [Ger03a, Opp71].

By looking at spectra (which displays the amount of each sinusoidal frequency present in a sound), a human sees a representation much more like what the brain receives when a sound is heard.

Fourier transforms have many uses, some of which include removing unwanted sounds from a recording such as “hiss” or “buzz” noises captured during recording, removing the vocals from a track to create a karaoke version of the song, or removing unwanted background sounds that exceed a preset amplitude (known as noise gating).

64

Figure 7.1 shows an example of a spectrum generated using a Fast Fourier Transform.

Figure 7.1 - An example of a Fourier Transform.

Fourier transforms are used in many signal analysis techniques. Examples include histograms and beat tracking [Dus04].

7.4 Constant-Q analysis

In signal processing theory, Q is the ratio of the center frequency of a filter band to the bandwidth. The width of each frequency band in the constant-Q transform is related to its center frequency in the same way, and thus is a constant pitch interval wide, typically 1/3 to 1/4 of an octave. This allows for more resolution at the lower-frequency end of the representation and less resolution at the higher- frequency end of the representation, modelling the cochlear resolution pattern. [Bro92, Ger03a] 65

Constant-Q analysis is seen by some to give a better spectral representation than a Fast Fourier Transform. The transform is similar to the human auditory system, whereby at lower frequencies spectral resolution is better, whereas temporal resolution improves at higher frequencies. Therefore, for musical data this is a reasonable trade off [Bro92].

Constant-Q transforms are used in extracting timbre, which aids in identifying the type of instrument that is playing at specific intervals.

There are a number of difficulties in implementing Constant-Q transformations. 1. It is more difficult to program. 2. It is more computationally intensive and therefore requires more processing power. 3. Lastly it does not guarantee a result that is invertible, that is to say, the result followed by synthesis might not be the same as the original signal [Ger03a].

7.5 Wavelet Transforms

A Wavelet Transform and more particularly, the Discrete Wavelet Transform (DWT), is a computationally efficient technique for extracting information about non-stationary signals such as video and audio [Coo01]. It was developed as an alternative to the short time Fourier transform (STFT) to try and overcome the frequency and time related problems related to using a STFT.

Wavelet transforms are used in genre classification of music. This is done by using statistical pattern recognition on feature vectors derived from wavelet analysis [Coo01].

66

A major drawback of the Fourier transform is that it is a representation that is based completely in the frequency domain. Using the Fourier transform, one can have information about only the frequency behaviour of the signal, without knowing when that behaviour occurred, unless a technique like the Short Time Fourier Transform (STFT) is used [Coo01].

Multi-resolution techniques look at the spectral makeup of the signal at many different time resolutions, capturing the low-frequency information about the signal over a large window and the high-frequency information over a smaller window. In the wavelet transform, this is accomplished by using a basis function that is expanded and contracted in time [Ger03a].

In the discrete wavelet transform, the wavelet is stretched to fill the entire time frame of the signal, analyzing how much low-frequency information is present in the frame. The wavelet is then scaled to fit half of the frame, and used twice to analyze the first half and the second half of the frame for slightly higher frequency information, localised to each half. Proceeding by halves, the entire frequency spectrum is covered [Coo01, Ger03a].

Multi-resolution transforms, like the wavelet transform, attempt to cross the boundary between a purely time-domain representation and a purely frequency- domain representation. They do not correspond to “time” information or “frequency” information, rather the information that they extract from the signal is a kind of time-frequency hybrid. Methods can be employed to extract time or frequency information from a multi-resolution representation such as the wavelet transform [Coo01, Ger03a].

67

Figure 7.2a – Signal [Pol07]

Figure 7.2b – 45 degree rotated view of the continuous wavelet transform of the signal in 7.2a. [Pol07]

68

Figure 7.2b shows a 45 degrees view of the continuous wavelet transform of the signal in figure 7.2a.

Wavelet transforms are similar to Fourier Transforms in that they aid in the extraction of features of a song.

7.6 Pitch analysis

Pitch detection has been a popular research topic for a number of years now. The basic problem is to extract from a sound signal the fundamental frequency (f0), which is the lowest sinusoidal component, or partial, which relates well to most of the other partials [Ger02]. In a pitched signal, most partials are harmonically related, meaning that the frequency of most of the partials are related to the frequency of the lowest partial by a small whole-number ratio. The frequency of this lowest partial is the fundamental frequency of the signal [Coo02, Ger03a].

Most research into this area goes under the name of pitch detection, although what is being done, is actually fundamental frequency detection. Because the psychological relationship between fundamental frequency and pitch is well known, it is not an important distinction to make, although a true pitch detector should really take the perceptual models into account and produce a result on a pitch scale instead of a frequency scale. [Ger03a]

Marchand describes a method of pitch detection based on a combination of two Fourier transforms. He uses a first order Fourier Transform and an algorithm which he says enhances the robustness of the detection algorithm. His method is proven to be very accurate and robust in practice on natural sounds, such as voice, classic musical instruments, and even some kinds of noise [Mar01].

69

The fundamental frequency is useful in data reduction, discussed in the next section.

Pitch has been found useful in music genre classification. This is done by the use of a pitch histogram. In its simplest form, a pitch histogram is simply an array of 128 integer values. Each of these values represents a note and the size represents how frequently the said note occurs in the music piece [Coo02].

Figure 7.3 – Pitch histogram of a jazz song and an Irish folk song [Coo02]

Figure 7.3 shows different pitch histograms for a jazz song and an Irish folk song. The jazz song on the left shows a rich spectrum, while the Irish folk song on the right is sparse towards the lower notes, showing fewer chord changes [Coo02]

The next section introduces data reduction. Data reduction is required to classify songs into different categories.

70

7.7 Data Reduction (Feature Extraction)

The first step in a classification problem is typically data reduction. Most real- world data, and in particular sound data, is very large and contains much redundancy, and important features are lost in the cacophony of unreduced data. The data reduction stage is often called feature extraction, and consists of discovering a few important facts about each data item or case. The features that are extracted from each case are the same, so that they can be compared. [Ger03a]

Feature extraction is typically the first stage in any classification system in general, and in ASC systems in particular. The features that are used in ASC systems are typically divided into two categories: perceptual and physical. Perceptual features are based on the way humans hear sound. Examples of perceptual features are pitch, timbre and rhythm. Physical features are based on statistical and mathematical properties of signals. Examples of physical features are fundamental frequency, Zero-Crossing Rate (ZCR), and Energy [Ger03a].

Feature extraction techniques are used by the Enhanced Music Mapping Agent discussed in later chapters. Once features have been extracted from a song, the EMMA can use this information to classify a song.

7.7.1 Physical Features

Physical features are typically easier to recognize and extract from a sound signal because they are directly related to physical properties of the signal itself.

7.7.1.1 Energy

Energy is a measure of how much signal there is at any one time. Energy is used to discover silence in a signal, as well as dynamic range. The energy of a signal 71

is typically calculated on a short-time basis, by windowing the signal at a particular time, squaring the samples and taking the average [Ger03a]

7.7.1.2 Zero Cross Rate

Put simply, the ZCR is a measure of how often the signal crosses zero per unit time. The idea is that the ZCR gives information about the spectral content of the signal. One of the first things that researchers used the ZCR for was to calculate the fundamental frequency. The thought was that the ZCR should be directly related to the number of times the signal repeated per unit time, which is the frequency. However, if the signal is spectrally deficient, like a sinusoid, then it will cross the zero line twice per cycle, as in Figure 7.4a. However, if it is spectrally rich as in Figure 7.4b, then it might cross the zero line many more times per cycle.

Figure 7.4 - Examples of Zero-Cross rate

It has since been shown that ZCR is an informative feature in and of itself, unrelated to how well it tracks the fundamental frequency [Ger03a]. Scheirer used the ZCR as a correlate of the spectral centroid of the signal, which indicates where most of the energy of the signal is. Saunders gathered data about how the ZCR changes over time, and called this a ZCR contour. He found that the ZCR 72

contour of speech was significantly different than that of music [Sau96, Ger03a]. One of the most attractive properties of the ZCR and its related features is that these features are extremely fast to calculate [Ger03a].

7.7.1.3 Spectral Features

The spectrum of a signal describes the distribution of frequencies in the signal. Spectral techniques have been used historically to analyze and classify sound. The spectrogram is the time-varying spectrum of a signal. One of the most fundamental spectral measures is bandwidth, which is a measure of what range of frequencies is present in the signal [Ger03a]. This feature can be used to discriminate between speech and music. Music typically has a larger bandwidth than speech, which has neither the low-frequency of the bass drum nor the high frequency of the cymbal.

Figure 7.5 shows examples of instruments and a singer and their respective wave forms and spectrums.

Figure 7.5 - Spectrum and Wave Form analysis of several instruments [Pet05]

73

7.7.1.4 Fundamental Frequency

The fundamental frequency is only relevant for periodic or pseudoperiodic signals. Periodic signals are signals which repeat infinitely, and perceptually a periodic signal has a pitch. Pseudo-periodic signals are signals that almost repeat. There is a slight variation in the signal from period to period, but it can still be said to have a fundamental frequency, corresponding to the slowest rate at which the signal appears to repeat. It is clear that extracting the fundamental frequency from a signal will only make sense if the signal is periodic [Ger03a].

Fundamental frequency detectors often serve a dual purpose in this case — if the extracted fundamental frequency makes sense for the rest of the signal, then the signal is considered to be periodic. If the fundamental frequency appears to be randomly varying or is detected as zero, then the signal is considered to be non-periodic. Fundamental frequency is an important feature for distinguishing between pieces of music, or for retrieving pieces of music based on the melody [Ger03a]. It also aids in the detection of timbre.

7.7.2 Perceptual Features

When extracting perceptual features from a sound, the goal is often to identify the features that humans seem to use to classify sound. Most perceptual features are related in some way to some physical feature, and in some cases, it is just as instructive to investigate the physical counterparts to these perceptual features. When physical features cannot be found that correspond to perceptual features, it is sometimes necessary to extract information, and classify the sound based on templates of sounds which have been identified to contain a certain perceptual feature [Ger03a].

74

This section introduces some features that can be compared to templates to classify music.

7.7.2.1 Pitch

Pitch seems to be one of the more important perceptual features, as it conveys much information about the sound. It is closely related to the physical feature of the fundamental frequency. While frequency is an absolute, numerical quantity, pitch is a relative, fluid quantity. Humans can perceive pitch in situations where current fundamental frequency detectors fail. When presented with two simultaneous pure tones at a given interval, a human can hear the fundamental frequency that would be common to both tones, if it were present. Thus, if two pure sinusoidal tones a fifth apart were played, a pure tone an octave below the lower of these tones would be perceived, as a fundamental to the perceived harmonic series. This implies that for pitch perception, the frequency spectrum of the signal is at least as important as the fundamental frequency [Ger03a].

7.7.2.2 Timbre

When humans discuss sound, they talk of pitch, intensity, and some other well- definable perceptual quantities, but some perceptible characteristics of a sound are more difficult to quantify. These characteristics are grouped together, and collectively called “timbre,” which has been defined as that quality of sound which allows the distinction of different instruments or voices that sound of the same pitch. Most of what is called timbre is due to the spectral distribution of the signal, specifically at the attack of the note. The extraction of physical features that correspond to timbral features is a difficult problem that has been investigated in psychoacoustics and music analysis without definite answers [Ger03a].

75

7.7.2.3 Rhythm

When a piece of sound is considered rhythmic, it often means that there are individually perceivable events in the sound that repeat in a predictable manner. The tempo of a musical piece indicates the speed at which the most fundamental of these events occur [Ger03a, Ger03b].

Tempo and rhythm are closely interlinked. From the author’s experience, songs with rhythmic beats are enjoyed by a wider audience, and can be placed in between songs of different tempo’s to change the direction of a playlist.

7.8 Significant Patterns

Cambouropoulos describes a method for classifying music by matching significant segments of music, rather than individual or multiple features. He developed a model known as the Local Boundary Detection Model (LBDM). This method analyses the differences between successive notes, such as duration and pitch [Cam00, Cam03, Cam04]. Pattern matching is then applied to significant patterns (for example, those that occur regularly). This method appears to work well for music containing single instruments.

7.9 Clustering

The next step in any classification problem is to find what feature values correspond to which categories. Once this has been decided, music files can be categorised or clustered together. The next section introduces clustering and presents some methods for clustering music into groups.

When a set of features has been extracted from a sound, the features are usually normalized to some specific numerical scale, for example, the amount of rhythm- 76

ness on a scale from 0 to 10, and then the features are assembled into a feature vector. The next task is usually to decide to which of a set of classes this feature vector most closely belongs. This is known as classification. Clustering, on the other hand, is the automatic creation of a set of classes from a large set of example feature vectors [Ger03a].

Clustering algorithms usually make use of representative cases. These are cases which represent the clusters of which they are members, and are often chosen as the case closest to the centroid of the cluster. One of the simplest clustering algorithms starts with these representative cases, and when seeking to classify a new case, simply chooses the representative case that is closest to the new case, using some feature-space distance metric. An adaptive version of this algorithm would then choose a new representative case for the cluster, based on which case is now closest to the centroid of the cluster [Ger03a].

Clustering algorithms may have the representative cases pre-determined, or may determine the representative cases in the course of the algorithm. There may be a pre-determined number of clusters, or the algorithm may determine the number of clusters that best segments the parameter space [Ger03a].

7.9.1 Neural Nets

It is possible to choose the classes beforehand, and allow the algorithm to choose parameters and map out the parameter space. This is the technique used in neural net clustering. The neural net is presented with a set of training cases, each with a corresponding class. The neural net then trains itself to select the correct class when presented with each case, and to be correct for as many of the training cases as possible. When this process is complete, the network is ready to classify new cases [Ger03a, Lea07].

77

The neural net is a computational technique based on a model of a biological neural network. A neural network contains neurons (or neurodes) which receive as input a group of electrical impulses, and provide as output an electrical pulse if and only if the combined magnitude of the incoming impulses is above a certain threshold. Neural networks are groups of these modelled neurons which react to input and provide output. Usually there is an input layer of neurons which accept an incoming parameter vector, one or more hidden layers which do processing, and an output layer that provides a classification [Ger03a, Lea07].

Figure 7.6 – An example of a neural network

Figure 7.6 shows an example of complex neural network. The four grey circles represent the logic which determines which path to follow next depending on the 78

different weighting received from the three inputs. In the above case, the result can only be one of two outputs, represented by the green circles.

What makes the neural net powerful is not the neurons themselves, but the connections between them. Each connection has an associated weight, corresponding to how much of the signal from the source neuron is passed to the target neuron. If a neuron receives input pulses from four other neurons, but each connection has weight 0.1, the target neuron will not fire. If, however, the weights are all 0.2, the target neuron will fire, continuing the process.

Neural networks are usually set to a specific task by training. In this process, an input vector is presented along with a suggested result. The connection weights are adjusted using some algorithm to ensure that the network makes the proper classification. As more training vectors are used, the network more closely approximates a tool that can do the classification. The training of a neural network can be time-consuming, and it is difficult to tell what is going on inside of the network [Ger03a, Lea07].

7.9.2 Successive Restriction

Neural networks and most other cluster-based classification techniques are synchronous, that is a choice is made between all possible classes in one step. A different way to classify is by successive restriction, a technique related to a process of elimination. In both of these methods, the classification is made over several stages. At each stage one or more classes are removed from the list of possibilities. Successive restriction algorithms are usually designed heuristically for a specific task [Ger03a].

79

Successive restriction algorithms can therefore be designed to specifically cluster music, however, it may be simpler to implement techniques already developed such as neural nets or k-means clustering discussed in the next section.

7.9.3 K-Means Clustering

K-means clustering, is fairly straightforward, robust, quick to implement and easy to understand, but it will only classify a finite pre-determined number (k) of classes. To get around this, researchers will often set a k-means classifier to work with twice or three times as many classes as might be required, and then combine some of the clusters to form the final grouping. For example, if a researcher suspected that there were 3 classes in a case set, he might set a k- means classifier to find the best 9 clusters in the set, and then combine some of the clusters to generate the 3 best clusters [Kun02].

The algorithm begins with the first k points in the case set as starting points for the representative cases for the k classes. As the algorithm considers each new case in sequence, it chooses the k points which are furthest apart, so that when all cases have been considered, the algorithm has the best approximation of the bounding n-dimensional k-gon of the case set, where n is the dimensionality of the parameter space. Figure 7.7 shows a set of cases in 2-dimensional parameter space. The left frame of the figure shows the k furthest points selected. In this case, there are three clusters (represented by +,x and *) to find, so k = 3 [Kun02, Ger03a].

80

Figure 7.7 - An example of a k-means classification for three classes. [Ger03a]

K-Means clustering is not optimal in that the algorithm is not guaranteed to return a global optimum. The quality of the final solution depends largely on the initial set of clusters. In practice the final solution may be much poorer than the global optimum. However, since the algorithm is extremely fast, it is commonplace to run the algorithm several times and return the best clustering found [Kun02].

K-Means clustering is a useful technique when clustering music based on matching features with templates.

7.10 Analysis Duration

Feature extraction and analysis is generally done on small portions of a song, rather than the entire song at once. This can cause problems and it is important to keep in mind how much of the signal is being used to extract the feature. Some features can be extracted from the short chunks while others require a much longer length. 81

A number of the early systems used for classification only used a single analysis duration frame of a fixed size and swept across the entire signal. This method works if the window size is made small enough, as features that required a longer duration may be extracted from successive frames. However, this often requires much more calculation that would be necessary if the frames used by the extraction process were of varying sizes [Ger03a].

The template-matching method for classification also uses analysis. As songs to be classified will always have varying lengths, template comparison would require some form of duration matching to overcome the varying lengths of the songs.

One method for tackling this problem is known as linear time stretching. Linear time stretching takes the time scale of the signal and stretches it to match that of the template. This method is not very successful with audio signal classification as it alters feature values such as frequency as it stretches.

A different method of duration matching, Hidden Markov models (HMMs), is a method that does not alter feature values. HMMs tracks the occurrence of expected events as a signal progresses [Par05].

7.10.1 Fixed Analysis Time Frame

Fixed analysis time frame is a method for segmenting a piece of music as discussed in 7.9.4. Fixed analysis time frames are easily implemented by choosing a frame size which is useful for the application, and then keeping it throughout the piece of music [Ger03a].

82

Feature extractors can be optimized for the frame size, and as sequential frames are considered, the signal can be fully examined. This method presents a problem in that some features that are in different time scales are less easily extracted in a single frame size. For example, features that exist over a very short time, like the fundamental frequency, are measured well in smaller frames, and features that exist over longer periods of time, like rhythm, are better measured using longer frames.

Multi-resolution analysis (see the following section) can solve this problem. The other option is to observe how short-time features change from frame to frame. For example, rhythm can be observed using the same frame size as the fundamental frequency, by observing the change in total signal energy from frame to frame [Ger03a].

7.10.2 Multi Resolution Analysis

Some classification algorithms attempt to make decisions about a signal from information contained within a single frame. This creates a classification that changes over time, with very high resolution, but does not take into account the dynamic nature of the perception of sound. The features of fundamental frequency constancy or vibrato, for example, must be measured over several frames to be valid because each frame can have only a single fundamental frequency measurement. One must observe how the fundamental frequency changes from frame to frame to see if it is constant or varying in a vibrato-like way [Ger03a].

Other classification algorithms take into account all frames in a sound before making a decision. Multi-resolution analysis is a term that includes all ways of investigating data at more than one frame size. One multi-resolution analysis technique that has received much attention is wavelet analysis. A wavelet, as the

83

name implies, is a small wave. Wavelets have a frequency and an amplitude, just as waves (sinusoids) do, but they also have a location, which sinusoids do not have [Coo01, Ger03a].

In the wavelet transform, the input signal is represented as a sum of wavelets of different frequencies, amplitudes and locations. A set of wavelets is generated from a single mother wavelet, and the different sizes of wavelets are generated by scaling the mother wavelet. It is for this reason that wavelet analysis is multi- resolution. The low frequency wavelets extract information from the signal at a low resolution, and the high frequency wavelets extract high resolution information. The success of this technique comes from the fact that many resolutions can be investigated at one time, using one transform [Coo01, Ger03a].

7.11 Conclusion

This chapter provides background for music analysis. It analyses some of the techniques available for classifying music, as well as the steps required.

Any number of the above mentioned techniques can be used to classify music. Depending on which are used, different results can be achieved. For example, if one were to only use beat tracking, songs with similar tempo and possibly similar rhythm will be clustered together.

Using multiple techniques can improve on the quality of the cluster generated. However, these can lead to imperfect results, especially in the case where a song spans multiple genres. In this case, template matching can be used to match features of a song to already known songs and clusters. This method requires that results be clustered (as some features will provide different categories). For this, K-Means clustering is recommended as the preferred clustering method. 84

The next chapter describes a model which uses the techniques described in the chapters leading up to it.

85

8 Model – The Intelligent Music Mixing Agent

8.1 Introduction

This chapter provides a model for an agent known as the Intelligent Music Mixing Agent (IMMA). This IMMA uses the techniques discussed in previous chapters to provide a user with a play list of songs which the user would like to listen to. The agent implements pluggable intelligence which allows it to modify its intelligence at runtime, depending on the environment in which it is placed.

8.2 The IMMA

The (IMMA) is a perfect world implementation of an agent using embedded intelligence. It is able to sense its surrounding environment by detecting how much processing power is available to it, after which, it chooses an appropriate analytical scheme for analyzing the music files given to it. The IMMA is portable application that can execute on any platform running on any device. Figure 8.1 shows a diagram of the component structure of the IMMA.

Figure 8.1 - The IMMA 86

This chapter introduces the IMMA and investigates its inner workings namely: 1. The IMMA’s Environment 2. The Embedded Intelligence Schemes 3. Skills of the IMMA 4. Obtained Knowledge of the IMMA

The next section looks at the agent’s environment, ways in which it can perceive its environment and steps it can take to react to changes in this environment.

8.3 The Environment

The IMMA needs the ability to perceive its environment so that it can make choices of which intelligent scheme it can choose. The agent can use two factors to evaluate its environment namely: • The processing power of the device it is running on • The type of file that is to be analysed

8.3.1 Processing power

Measuring the processing power of a system allows the agent to decide which algorithms to use to classify music. As discussed in chapter 7, there are many different calculations that can be done to classify music. However, if the agent is working in an environment which has low processing power then it may not be able to classify songs in a timely manner. This can lead to frustration on part of the user as the agent may not be able to provide the user with results within an acceptable time frame.

Thus, having the agent analyse its processing environment will allow the agent to reduce the amount of calculations it does to classify music. Though this will

87

reduce the quality of the classification, it allows the agent to produce results more quickly.

The task of deducing the processing power of the agent’s environment is, however, more complex than simply measuring the clock speed of the processor or processors that the agent is executing on. For example, a processor designed specifically for encryption purposes can process requests much faster than modern central processing units (CPUs), even though the clock speed is less than a 20th of that of the CPU [McL03, Sim05]. If the agent was executing on similar hardware designed specifically for doing the calculations it required to classify music, then it would mistakenly make the assumption that it was working in a low processor environment and therefore only choose to do a minimal amount of calculations, thereby negating the advantages provided by such hardware.

The agent has the option of trying each intelligence model in turn and measuring the time taken, but this won’t necessarily be accurate as processing one song might take longer than another. Therefore one would expect user intervention on the part of selecting intelligence.

The next section looks at different environments based on processing power that the agent may encounter.

8.3.1.1 Average Processing Power

Personal computers are becoming more readily available. The United States estimated that 61.8 percent of households had computers in 2003 [Usd04]. Personal Computers are available in most households and are being used more and more to store and play digital music [RIA05, IFP05]. Profiling music at run time is not such a necessity as it can be scheduled to compute while the processor’s load is low. This therefore allows the agent to create a more accurate 88

profile for each file analysed. However, it does strain the overall performance of the agent before it has gathered any data. This environment is therefore a good target for modifying the intelligence. By allowing the user to choose how much processing must be done, the agent can provide a good performance to accuracy ratio.

8.3.1.2 High Processing Power (Parallel Computing)

A parallel computing environment is a relatively unknown environment. Although, with the advent of Hyper Threading and dual core technology, more and more applications need to consider the impact of multiprocessors on their programs [Int03, Mag05]. However, the focus of this environment will be mostly on multi- computer environments or multi-processing environments where the computing power is at least 4 times greater than that of the average PC discussed in section 8.3.1.1. This type of computing may be advantageous in places such as radio stations where near perfect results in music selection is required.

With multi-computing, the task of profiling songs must be divided among the different processors available. There are 3 ways of doing this: • Option 1: Divide the different profiling functions of each song to each processor • Option 2: Divide the load of each profiling function of each song to each processor • Option 3: Divide the task of profiling each song to each processor

The first option requires a bit more knowledge of parallel program writing to implement. It requires that each song is available to each processor. In a multi- computer environment, this means that the data must be replicated on each machine that forms part of the cluster. The programmer must also face the challenge of balancing the load between the processors, as different functions

89

will take different amounts of time to complete, and will therefore leave some processors idle.

Option 2 requires a lot of parallel programming knowledge to implement. Each profiling function will be split up between the different processors available. Again, this requires that the data of each song be sent to each computer in a multi-computer environment. This option will however, profile each individual song the fastest.

Option 3 allows for the least programming overhead. A simple manager-worker paradigm will allow the use of the same code for a PC to be implemented in a parallel environment. In a multi-computer environment, the data of each song is only sent to the computer that is busy working on profiling it. Although this will profile individual songs at the same speed of a PC (slightly slower if overhead is taken into account), it will profile a number proportional to the amount of processors available in the same amount of time that a PC would.

The next section looks at an imaginary case where processing power always exceeds the required level.

8.3.1.3 Infinite Processing Power

In an ideal world, each platform that the IMMA runs on would be able to do all computations faster than what would be needed. This would mean that the IMMA would be able to profile a song as accurately as the current mathematical system allows, and therefore be able to accurately choose a list of songs that best suit the mood of the end user.

The next section looks at low processing environments, such as portable music players.

90

8.3.1.4 Low Processing Power

The final environment includes devices such as portable MP3 players, car radios and DVD players. These devices are built with very low processing capabilities and are generally designed to only do the specific function they were designed to do (i.e. play music). This environment is steadily growing and becoming more and more popular as devices become cheaper and able to hold more data [Mac05,Cli05]. It therefore becomes an interesting challenge for the IMMA.

With the limited processing power available to the IMMA, there are three ways in which it can profile songs: • Don’t do any profiling and require faster processors to do the processing and later upload the information to the device • Do very little processing and return inaccurate results • Do more processing in the background even if it takes a long time

The first option isn’t very viable. Firstly it makes the assumption that one has a computer with more processing power and secondly, that there is sufficient time available for this computer to profile the songs on the device.

The second option is a better one. Although initially inaccurate results will be produced, once the device has run one profiling function on each song, it can go through all the songs a second time using a different function and therefore increasing the accuracy of each profile. This can, however, take a very long time.

The third option should produce accurate profiles, but due to the lack of processing power, it may take a very long time before the device has enough data to perform any useful functions on that data.

91

8.3.2 Files to be analysed

The advantage of using the type of file is that an intelligence model can automatically be chosen. This does not, however, give an indication of how much processing can be done on each file and therefore as with the first solution this would require user intervention.

Therefore the environment is formed by the two methods combined. However, as the IMMA will only be focusing on the MP3 file format, the environment will only involve the first situation. Four possible environments are looked at: Low processing power, average processing power, high processing power and infinite processing power.

The next section looks at embedding intelligence into the agent, and allowing the agent to modify this intelligence at run-time.

8.4 Embedding Intelligence into the Agent

This section looks at how the agent would choose an appropriate intelligence scheme for the environment in which it operates in. It looks at how the agent would perceive its environment and some of the intelligence schemes that are available for it to choose.

8.4.1 How the IMMA Perceives its Environment

The IMMA’s environment is comprised of three components: • The data files it works on • The device it is executing on • The input from human interaction

92

The data files provide little insight into what intelligence to use. They provide only a means of choosing which decoding algorithm to use. Therefore, focus is placed on the device, and more importantly, its processing power.

If the device the IMMA is operating on has limited processing power, then it cannot perform all possible calculations on each song as this would take too much time to profile each song, and therefore be of no use. The other factor to consider would be human interaction.

If songs are regularly skipped after short intervals of being played, then the selection portion is flawed. This, in turn, means that songs are not profiled accurately and more calculations need to be done on each song.

This, therefore, requires the model to be able to deviate with the amount of calculations it does per song. This is expanded on further under the heading of ‘Song Ratings’ in the next section.

The next section looks at ways of developing different intelligence schemes.

8.4.2 Different Intelligence Schemes

8.4.2.1 Introduction

As the IMMA needs to be able to modify its intelligence at runtime, it is important to analyse some of the intelligence schemes available to it. This section therefore looks at different intelligence schemes to match the processing environments discussed in the previous section. These intelligence schemes include:

93

1. Simple Analysis – Designed for devices with low processing power 2. Complex Analysis – Designed for devices with average to high processing power 3. Quantum analysis – Designed for devices with infinite processing power.

Figure 8.2 - A comparison of classification methods on various data-sets. [COO01]. Key: FFT – Fast Fourier Transform MFCC – Mel Frequency Cepstral Coefficients DWTC – Discrete Wavelet Transform

Figure 8.2 shows a comparison of music genre classification done using various techniques. Mel Frequency Cepstral Coefficients performs the best which is to be expected as it uses a number of techniques including a Fourier Transform to perform its calculation.

94

8.4.2.2 Simple Analysis

Simple analysis will involve doing the minimum required to achieve a profiled song. As the processing power available to this form of analysis is limited, the analytical techniques used can vary according to the implementer’s choice. Figure 8.2 gives a good comparison of some of the techniques used and their accuracy. However, if such techniques are not available, even simpler techniques such as beat analysis would have to be done. In such a case, the profiling accuracy might match that of the Random bar in Figure 8.2. Figure 8.3 provides a comparison between popular music and classical music beats.

Figure 8.3 - Beat histograms for classical (left) and contemporary popular music (right). [COO01]

95

Figure 8.3 shows differences in the beat histograms for classical and pop music. Beat histograms provide a method of grouping music into different genres.

8.4.2.3 Complex Analysis

Complex analysis will involve doing most of the calculations. It will, however, skip those that are similar. For example, the Fast Fourier Transform and the Discrete Wavelet Transform produce similar results (according to Figure 8.2). This analysis will focus mostly on modern PC’s, which have ample processing power and it is simple enough to run calculations in the background without affecting the end user’s work.

8.4.2.4 Quantum Analysis

Quantum analysis would be reserved for parallel computers. This would, for example, be used by the distributor, allowing for the distribution of already encoded and profiled music, thereby taking the load off the end device (whether it be a PC or portable device). Quantum analysis would perform all possible calculations, thereby increasing profiling accuracy as well as accommodating devices that don’t support all analysis methods.

8.4.2.5 Conclusion

Different analytical techniques are available to the agent depending on the processing power that is available to it. The ideal environment would be one of infinite processing power, thereby allowing the agent to do all possible calculations.

96

The agent requires the ability to analyse its environment and choose the intelligence scheme to match.

The next section looks at the agent’s skills. The IMMA’s skills include clustering (its ability to group songs together correctly), selection (its ability to correctly select an accurate playlist of songs) and rating (its ability to rate a song depending on interactions made by the end user).

8.5 IMMA Skills

The agent’s skills enable it to use its knowledge to provide the end user with a useful result. In this case, the IMMA will have the following skills: clustering, selection and rating.

8.5.1 Clustering

After a set of features have been extracted, one needs to decide to which of a set of classes this feature vector most closely belongs. This is known as classification. Clustering, on the other hand, is the automatic creation of a set of classes from a large set of example feature vectors. In the field of clustering, the features are usually referred to as parameters. The feature vector representing a specific datum is called a case. In a typical clustering problem, there are a large number of cases to be clustered into a small number of categories [Ger03a]. The IMMA needs to be able to decide which cluster a certain song belongs to. It will also have to overcome the problem where some songs have been profiled using a discrete wavelet transform and others using a fast fourier transform.

97

Clustering therefore requires some categories within which each song can fall. The following sections will discuss techniques which can be used to cluster songs.

8.5.1.1 By Genre

Clustering songs by genre allows the agent to choose a song which is similar to the one it is trying to match. The disadvantage of clustering by genre, is that songs that are similar that lie outside the boundaries of a certain genre will never be chosen. For example, a ballad written by a heavy metal band might be classified as heavy metal, and therefore never chosen with a ballad written by a rock band. Another disadvantage is that the number of genres is ever increasing, with some songs spanning multiple genres. Such songs would be difficult to place into just one genre and doing so would probably break the selection

8.5.1.2 By Beat

Clustering by beat has the advantage that each song would be similar in pace. This would suit a dance party, as the beat is the most important thing. However, when choosing songs for a single user, choosing songs with the same beat would not be optimal. Take for example a radio station: songs being played alternate frequently between upbeat and low beat songs.

8.5.1.3 By Mood

Clustering by mood enables the agent to effectively decide on which song to choose next. Whether it is upbeat or slow, if it mimics the mood of the user then it would be an effective choice. The problem with clustering by mood is that it is

98

very difficult to decide what mood a song is, specifically as it is a human emotion and can therefore have different results depending on the user.

8.5.1.4 Clustering Errors

The IMMA must allow for the possibility that it has made a mistake with clustering. If a user continually skips a song when it is played after a specific cluster has been chosen, but then selects the song him/herself while another cluster was being played, then the agent can deduce that the song falls inside another cluster. The agent must then either decide on the next closest cluster, or must choose the cluster linked to the songs which are played before and after the song.

8.5.2 Selection

The IMMA is required to be able to detect the ultimate mood of the end user. Of course, finding a song within the cluster is only part of the selection process. The agent must also take into account other factors such as whether the user enjoys listening to that particular song. There are several ways in which this can be achieved.

8.5.2.1 Pre-Emptive Selection

If the user has already made a song selection, the agent will be able to select a similar song to that which has already been chosen.

99

8.5.2.2 User Interaction

If a user skips a song within the first 10 seconds of the song, then either he/she does not like the particular song, or he/she does not wish to listen to the song at that particular time. If the latter is assumed, then another type of song can be chosen to be played next. If, however, the entire song is listened to, then it can be assumed that the correct type of song has been chosen and another in the same cluster can be chosen

8.5.2.3 Cross Cluster Selection

Music can alter the mood of the person who is listening to it. Cross cluster selection can achieve this task, though it would be risky to do so. Trying to change a person’s mood from happy to unhappy could have negative consequences. Moving from one cluster to another can also be tricky. Such instances can be solved by moving closer to the border of a cluster. If a song is skipped, then you may be moving in the wrong direction. However, if all songs are played then it might be possible to move to the next cluster.

8.5.3 Rating

The IMMA’s ability to rate a song is important to its end use. If a song is rated low by the end user, but is continually picked by the agent as the next song to play, then the agent is not functioning well. However, it is very difficult for software to estimate how the user rates a specific song, although such a technique has already been discussed.

By analysing the amount of time spent listening to a specific song, the agent can build up an expected rating. If the user continually skips a song within the first ten

100

seconds of the song being played, then he/she obviously dislikes the song and it might therefore be selected seldomly. If however, the user continually scrolls to a specific song and plays it all the way through, then it must be one of the user’s current favourite songs and can be picked regularly.

The rating algorithm needs to decay fairly quickly as a user can go from playing a song repeatedly to not playing the song at all. The best algorithm would keep track of every time the song is played and for how long the song was played, whether it was chosen by the device to play next or whether the user him/herself decided to play the song. It would also keep track of which song was played before and after such a track. This would then provide enough information for the agent to decide when to play a song less, what to play before and after the song is selected.

8.5.4 Conclusion

As discussed, clustering is best done by mood. However, it is difficult for a machine to understand the mood of a song, and even if it could, it would yield different results to that of a human.

Song selection also produces interesting ways in which the amount of song skips a user experiences, can be minimized.

The next section looks at how the agent stores the knowledge that it has accumulated over a time period.

101

8.6 Knowledge Obtained

This section deals with the knowledge that the agent has collected to date and how it will store this knowledge for later retrieval and usage.

8.6.1 Introduction

The knowledge stored in each music file is most valuable to any program wishing to utilize the methods discussed so far. With prior information on data such as how often the song is played as well as the song’s profile, it can be chosen to great effect. This section outlines the knowledge that can be useful to the agent as well as methods for storing this information.

8.6.2 The Database

The agent requires a database of all the knowledge that it has thus far obtained. There are two ways in which the agent can store its data:

8.6.2.1 Within the song itself

As discussed previously, songs allow for the storage of tag information. The ID3 version 2 tag of the MP3 format allows for any amount of data to precede the music data. This is also true for the ogg vorbis tag. Storing profile information in the tag itself is useful as the data then only has to be collected once. The song can then be distributed to different devices such as portable players, thereby saving the CPU of the device for other tasks. Storing user specific data such as how many times the song was played is not generally a good idea as this would compromise the ability of the agent to select a song in a multi user environment. For example, someone who enjoys listening to a specific song will listen to that 102

song often, therefore heavily increasing the amount of times the song was played. If someone who does not enjoy listening to that specific song uses the same device, then his/her results will be tainted by the previous person’s results.

Storing different profiles for each user can make the song bloated and oversized. Another disadvantage of storing the data in the song is that the agent would have to open and read the contents of every single file before it could make a decision on which song to select next - a tedious task which can take a long time.

8.6.2.2 Using a Database Management System (DBMS)

Databases have evolved much since the conception of the idea in the 80’s. Databases can retrieve data very fast as well as store it in an optimal way. The SQL syntax allows different servers to be used, depending on the user’s choice, as long as the server supports it. Using a DBMS to store data would be optimal for an agent as all low level code has been taken care of. As SQL is a powerful language, it would simplify the selection process if such queries on data could be handled by SQL queries. However, if profiling data is stored in a DBMS, then it won’t be copied with the song should it be copied to a portable player. In this case, the device would then have to re-profile the song itself.

8.6.2.3 Conclusion

Both ways of storing data has its advantages and disadvantages. The ideal situation is to store only data required for profiling in the tag area of the song itself while storing personal preferences in a DBMS.

The profiling data should also be kept in the database, even though this causes data redundancy, as it will speed up the selection process. This will use up more

103

disk space, so the cost of storing the extra data must be weighed against the cost of the extra time taken with each query.

The next section deals with a database structure. More specifically, it discusses where knowledge should optimally be stored.

8.6.3 Structure of the database

This section deals with what knowledge of a song should optimally be stored, and whether it should be stored on the song itself or in a database management service.

8.6.3.1 The Tagging Database

The tagging database should only contain profiling information. The ID3 version 2 tag of the MP3 format identifies tags by a four character code consisting of alphanumeric and numeric characters. Provision has not been made for such data as the IMMA needs to store, so such tags need to be added to the standard. The standard requires that all tags not recognized be skipped, thus allowing compatibility with players that do not support the method of profiling discussed. The proposed tags are as follows: TPCT – The cluster the song was placed in TFFT – The FFT used to cluster the song TWTF – The wavelet transform used to cluster the song

Items such as beats per minute already have assigned tags (TBPM) and thus these tags should be used for compatibility.

104

8.6.3.2 The DBMS

The database management system should ultimately store everything that the tag of the song contains. Therefore, such fields as Artist, Genre etc should be included. However, the database should also contain the following: • The last time the song was selected by the agent • The last time the song was selected by the user • The user’s rating of the song • The agent’s rating of the song • How often the song was skipped within 20s • How often the song was skipped before halfway • How often the song was listened to all the way through • The last time the song was skipped within 20s • The last time the song was skipped before halfway • The last time the song was listened to all the way through • The last song that was selected before • The last song that was selected after

8.6.3.3 Conclusion

Storing data in the tag of a song can be both advantageous and disadvantageous. If size allows, all data should be stored using a DBMS as this allows for fast lookups on data. Profiling specific data should be kept in the songs tag as this allows for the information to travel to whichever source the song is transferred to, thereby eliminating the need for re-profiling.

105

8.7 Chapter Conclusion

This chapter introduces a model of an agent that uses embedded intelligence to fulfil the task of generating a playlist of songs depending on interactions (both current and past) taken by a user. It looks at some of the intelligence schemes available to the agent, as well as how the agent should store any knowledge it obtained. The agent then uses this obtained knowledge to pre-empt what songs an end user would like to listen to.

The IMMA is not currently possible in a real world situation, as applications do not have the capability of running on devices such as an iPod.

The IMMA provides new direction for digital music players. The ability to intelligently choose which songs to play next would without a doubt add value to a portable music player. The IMMA provides a framework for profiling data between devices that have the ability to profile songs quickly, and those which don’t.

The next chapter looks at a real world prototype developed to simulate many of the functions described in this chapter.

106

9 Prototype – The Enhanced Music Mapping Agent

9.1 Introduction

The Enhanced Music Mapping Agent (EMMA) is a real world prototype based on the model described in the previous chapter. The design of the EMMA is modular, thus allowing for components to be inserted or removed at runtime.

This chapter introduces the interfaces of EMMA as well as the limitations imposed by current technology.

9.2 EMMA

Figure 9.1 shows the interaction between different components of the EMMA. The Main Agent Program connects the various components of the EMMA. The User Interaction Module monitors direct human interaction with the music player and stores decisions depending on interactions made using the database module.

The Intelligence Chooser selects Intelligence Plug-ins to analyse the user’s music collection. The results of this analysis are then sent via the Intelligence Chooser to the Database Module.

The Playlist Creator selects information from the database that was previously captured by the Intelligence Chooser and the User Interaction Module. It then uses this information to create a playlist of songs that the user might enjoy listening to.

107

The next section describes the interfaces these components use. Chapter 10 describes the components of the EMMA.

Figure 9.1 - The EMMA Module Layout

9.3 The Interfaces

The Interfaces dynamic linked library (DLL) contains the interfaces for creating a module to plug into or replace an existing part of the EMMA. Ideally, this component should be changeable at runtime. However due to the limitations of DLL’s, changing this component requires recompiling every component that inherits from it. Thus changing this component would ultimately render all current components obsolete. Therefore, this means that a solid and non-intrusive interface needs to be built which provides all information directly to the

108

intelligence schemes. This will negate the need to create new interfaces in the future as all information will already be available to the intelligence scheme.

Figure 9.2 shows the EMMA interfaces.

Figure 9.2 - The EMMA Interfaces

The Interfaces DLL contains the following interfaces: • The IPlugin Interface 109

• The IHost Interface • The IDatabase Interface • The IMain Interface

The next section looks at these interfaces in detail.

9.3.1 The IPlug-in Interface

The IPlug-in interface is designed for use in intelligence analysis plug-ins. It allows the intelligence choosers to communicate with the analysis plug-in. The interface contains basic methods for passing data to be analysed as well as setting which techniques a plug-in should use. This IPlugin-in interface can be extended by a plug-in chooser which allows for the hierarchical structure as described in section 3.3.2.2.

Descriptive functions: • Name() – returns the name of the Intelligence Plug-in • description() – a short description of what the plug-in does • analysisMethods() – The methods this plug-in uses to analyse data. This allows the plug-in chooser to decide and analyse the worth of each plug- in. • getWorth(analysisMethod) – what the plug-in thinks each analysisMethod is worth. • progress() – how far the analysis process is from completion. • isAnalysisComplete() – boolean expression to check if analysis is complete. • Analysis functions: o setAnalysisData(data) – Can pass either a filename or an array of bytes. Sets the data to be analysed by the plug-in. If a filename is

110

passed, the plug-in must decode the song data itself, whereas if raw data is passed, it has already been decoded. o getAnalysisResult(analysisMethod) – Gets a result of a specific method’s analysis. Passing null should return the category the song falls under. o startAnalysis(AnalysisMethod) – Starts analysis on a specific analysis method. Passing null should start analysis on all available methods.

Figure 9.3 The IPlugin Interface

Figure 9.3 shows the IPlugin interface including the methods described above which allows a plug-in that inherits from this interface to be accessible by the EMMA.

The Intelligence Plug-in interface provides a framework for pluggable intelligence in the EMMA. The next section looks at the interface for the intelligence chooser.

111

9.3.2 The IHost Interface

The IHost interface is used by intelligence choosers. It allows the main application to communicate with the plug-in chooser as well as allow the intelligence analysis plug-in to communicate any unforeseen errors that may have occurred.

As a plug-in chooser may choose another plug-in chooser as an analysis tool, it makes programming this component more difficult. This requirement means that an IHost component must be able to work as a master component as well as a slave component. A master component will read information from a database and pass this data on to the plug-ins it has chosen to use. A slave component must thus only work on data it is passed, rather than reading this information from the database.

Functions: • ShowFeedback(Feedback) – Outputs the message received by the plug- in. This will normally be an exception, but can be used for debug data as well. • getName() – Returns the name of the intelligence chooser. • chooseIntelligence(plug-in) – chooses a specific intelligence. Passing null chooses a plug-in automatically. • passData(data) – passes data to the plug-in. Data can be an array of bytes or a filename similar to that of the IPlugin. • analyse(data) – First calls passData(data) then calls analyse() of the chosen Intelligence Plug-in. This function starts analysis on the data passed. If no data has been passed then data is retrieved from the database. • setDatabase(db) – Sets the database that is being used by the agent. This allows the plug-in chooser to store results achieved by the analysis plug- ins, as well as retrieve information about the song it is analysing. 112

Figure 9.4 shows the IHost interface containing the methods described above.

Figure 9.4 The IHost Interface

The IHost interface enables the main agent program to communicate with the intelligence chooser. The next section looks at the database interface, which allows the agent to store information in any type of database.

9.3.3 The IDatabase Interface

The database interface is used by database plug-ins to store data in any required format. These can range from file databases to relational database management systems. As the database is a component, it removes the need for the plug-in choosers to know complex file operations or structured query language (SQL) queries.

A component that uses the IDatabase should store each individual song with a unique identification number. This allows the components of the EMMA to uniquely identify any song that it is working on. 113

The goal of a component that uses the IDatabase interface should be to provide consistent results by using only the simple functions provided to it.

The defined functions are: • getName() – returns the name of this IDatabase component • passSetting(setting,value) – sends a setting to this IDatabase component. An example of a setting would be a username to a relational database. Returns true if the setting was set successfully. • getItemID(item) – retuns unique identity of a specific item. • setAttribute(itemid, attribute, value) – writes the attribute and value to the database for a specific item. • getAttribute(itemid, attribute) – returns the current value of an attribute for a specific item. • addSong(path) – adds the selected song to the database. This method should create a unique id for the specific song. • getUnprocessedSong(attribute) – returns the identity of a song which doesn’t have a value for the passed attribute. • getSongPath(itemid) – returns the path to the song specified by itemid, • setSongClassification(itemid,group) – adds the song to a specific group. • getSongs(group,count,exclude()) – returns an array containing a number of song identities (count) that belong to the passed group. The list will exclude songs listed in the exclude array.

This allows the database to grow to any size without requiring the database module to be recompiled should new analysis methods be introduced

114

Figure 9.5 The IDatabase Interface

Figure 9.5 shows the IDatabase Interface including the methods described above.

9.3.4 The IMain Interface

The IMain interface is used by the Intelligence Plug-in chooser to communicate back to the main program. It simply contains one function: ShowFeedback(Feedback). This allows the Intelligence plug-in chooser to send back relevant data as well as debug information.

Figure 9.6 The IMain Interface

Figure 9.6 shows the IMain interface.

115

9.4 Conclusion

This chapter introduces the skeleton structure of the EMMA. It provides information on the interfaces used by the EMMA and how they should be used. The interface structure allows some of the components of the EMMA to be interchanged at runtime, for example, one could load a new intelligence chooser should a new version come out, without having to shutdown the program.

The Interface structure described allows for information to be passed directly to the Intelligence Plug-ins. This allows the Intelligence Plug-ins to handle all of the processing by itself and therefore a change to Interfaces should not be required.

As discussed, a change to the Interface structure should be avoided as this would require existing Intelligence Plug-ins to be updated. Another option would of course be to make the interface structure also a pluggable module. This would require another level in the design of an agent and would, however, add extra overhead to the intelligence schemes.

The next chapter looks at the actual components of the EMMA which use the interfaces described in this chapter.

116

10 The components of the EMMA

10.1 Introduction

The previous chapter introduces the skeleton structure of the EMMA. This chapter looks in detail at the components that were built to use the structure defined. These components are: • The Main Agent Program • The Intelligence Plug-in Chooser • The Intelligence Plug-ins • The Database module • The User interaction module • The playlist generator

10.2 The Main Agent Program

The main agent program ties all of the components together. It provides what graphic interface is required and passes simple data such as filenames of songs to save to the database. The main program allows for different components to be changed, however, this requires user intervention. The components that can be changed are the intelligence chooser, the database module and the user interaction module.

Figure 10.1 shows a UML diagram of the components and interfaces of the EMMA.

117

Figure 10.1 - UML diagram of the EMMA

118

10.3 The Intelligence Chooser

The Intelligence Plug-in chooser provides communication between the main program and the Intelligence Plug-ins. The Intelligence Chooser was designed as a plug-in so that different choosing techniques could be used depending on the end user’s needs. Most intelligence choosers would choose only one plug-in to analyse music files. However, it is possible for it to choose multiple Intelligence Plug-ins and then specify which data analysis techniques to use. This allows the plug-in chooser to choose multiple plug-ins and select the methods which it feels will provide the best results.

The Intelligence Chooser can choose another Intelligence Plug-in chooser as its choice of plug-ins. This allows for a hierarchical structure with plug-in choosers.

Two plug-in choosers were designed for the EMMA. The first automatically tries to select the best plug-in it has available. When choosePlugin() is called, it scans the start-up directory for DLL files that inherit from the IPlugin interface. Once it has a list of Intelligence Plug-ins, it queries them one at a time for the analysis methods they provide, as well as the worth they attach to each method. It then totals these values up and selects the plug-in whose total worth is closest, but less than square root of the megahertz of the main processor. If you take, for example, a PC processor running at 3Ghz or 3000 Mhz, the square root gives 54.77. Therefore, an intelligence plug-in would require a worth of smaller than 54 to be chosen by the automatic chooser.

The second plug-in allows the user to manually choose an Intelligence Plug-in. When choosePlugin() is called, it scans the start-up directory for DLL files that inherit from the IPlugin interface and creates a list of the ones which it can correctly load. It then provides the user with this list and prompts the user to select a plug-in.

119

The next section looks at the Intelligence Plug-ins that the plug-in choosers can load.

10.4 The Intelligence Plug-ins

The Intelligence Plug-ins analyse music files using specific techniques coded into them. They accept data in raw format and then apply any number of analytical techniques to the data. The results are then sent back to the Intelligence Plug-in chooser which in turn sends the information to the database module to be written to the database.

The analytical techniques used are: • Fourier Transform: The Intelligence Plug-ins use Fourier transforms to create a similar peak output to compare different songs to. • BPM detection: Using the data from the Fourier transform, the agent calculates the average bass beats per minute that the song has. Using this data, the agent can classify whether the song is upbeat, dance music or a slow song.

The next section describes the database module used by the EMMA.

10.5 The Database Module

The database module allows the EMMA to store information about the songs it has analysed as well as the user interaction it has received while analysing the actions taken by a user. The database module of the EMMA is an interface to a MySQL relational database. It takes parameters passed to simple functions defined in the IDatabase interface and converts them into SQL queries. This allows the modules of the EMMA to store the information they have gathered

120

about songs for later retrieval. As this module is also pluggable, the user can use whichever database structure he wants, whether it is a relational database or a simple text file storage, as long as the correct module is created and inserted into the EMMA.

The next section looks at the user interaction module, which monitors interactions made by the user such as skipping songs.

10.6 User Interaction Module

10.6.1 Introduction

The User Interaction Module analyses user interaction with the music player and depending on the input received, takes certain actions. The EMMA’s module is designed to interact with the Winamp music player, designed by Nullsoft. This section looks at how the user interaction module reacts to certain events (such as a song being skipped), as well as how the module rates each song accordingly.

10.6.2 Skipping a song

Skipping a song means the user clicked the “next” button on their music player. This is different to the user selecting a specific song to be played.

There are a few reasons why a song may be skipped. These include: • The user does not like the song • The user does not wish to listen to the song at that particular time • The user is looking for a particular song • The user is bored with the song 121

• The user accidentally pressed next

These interactions are described in more detail:

10.6.2.1 The user does not like the song

Optimally, the agent should just set the rating of the song to 0. However, the agent has no way of knowing whether or not the user does not like the song, or simply does not wish to listen to the song at this particular time.

10.6.2.2 The user does not wish to listen to the song at this particular time.

In this case, the agent should leave the song rating as is, as the user may still enjoy listening to the particular song. However, it has no way of distinguishing between this case and the case described in 10.6.2.1.

10.6.2.3 The user is looking for a particular song

In this case, the user may select to skip songs until a particular type of song starts playing. This can cause problems for the agent, as it has no way of knowing whether or not the user does not enjoy listening to these songs as described in case 10.6.2.1 or whether the user just doesn’t want to listen to these songs now as described in 10.6.2.2.

10.6.2.4 The user is bored with the song

This case is similar to the case described in 10.6.2.1. However, the rating should not be zeroed, but simply made small so that the song is picked less often.

122

As all these cases are very similar, it makes it difficult for the agent to make an informed decision about rating the song. Therefore, the EMMA uses a sliding scale rating system

10.6.3 The sliding scale rating system

The sliding scale rating system allows modules to rate a song depending on interaction taken by a user. This means that a module does not have to worry about a song’s previous rating while rating it on the current interaction.

The sliding scale rating system stores the last ten ratings generated by human interaction. If the song is played for less than 15 seconds then it is given a rating of 0. Otherwise, the rating is generated by taking a percentage of the song that was played (10 * elapsed seconds / total seconds). For example, if a song which has a total length of 4 minutes (240 seconds) is played for 90 seconds, it will have a 4 added to its average (rounding up 90 * 100 / 240). When a song’s current rating is required, the last ten ratings are added together and the average is taken. For example, the song in figure 10.2b will have an average rating of 4 (round down). This allows the system to specify if a song is a long time classic, if it was a hit at some stage and has now been overplayed or if the song is a current hit.

Figure 10.2 shows some examples of the last 10 ratings kept for 4 songs. Figure 10.2a shows a song that is enjoyed and often listened to. Figure 10.2b shows a song that was listened to a lot, but has been skipped the last 6 times. Figure 10.2c shows a song that had never been listened to, but was played through the last four times. Figure 10.2d shows a song that isn’t played very often but is still enjoyed.

123

Figure 10.2 - Examples of different song rating

10.6.4 Playing the full length of a song

A song can be played throughout for a number of reasons: • The user enjoys listening to the song • The user is not listening to the song being played

These interactions are described in more detail:

10.6.4.1 The user enjoys listening to the song

In this case, the correct action is of course to increase the rating of the song. This will therefore allow the song to be chosen more often. The side effect of choosing a song more often is that it can become overplayed, in which case the song will be skipped by the user. However, this will merely reduce the rating of the song and it will therefore be chosen less often.

124

10.6.4.2 The user is not listening to the song being played

This situation can occur if, for example, the user is not in the room while the song is being played. The music player would thus simply keep playing songs all the way through until the user returns. Optimally in this case, the agent should not increase or decrease the rating of the song. However, the assumption can be made that the agent will not have any means of detecting the user in the near vicinity, and therefore the agent must assume that the user wishes to listen to the song.

Thus, if a song is played all the way through, then the EMMA adds a new rating of 1.5 times the current average rating of the song. For example, if the song in figure 10.1d was being played, the rating in position 1 would change to 7.5, thus increasing the overall rating of the song.

10.6.5 User interaction in selecting a song

The above mentioned cases all assume that the agent has chosen to play a set of songs. These cases, however, do not cover the special case when a user selects a specific song to be played (i.e. bypassing all songs that the agent has chosen to be played). In this case, the agent should increase the rating of the song by as much as possible, as the user obviously enjoys listening to the specific song. This is done by increasing the rating of the song in position 1 to the maximum, which, in the case of the EMMA, is 10. This case is shown in figure 10.2c.

125

10.6.6 Section conclusion

This section looks at how the EMMA rated specific songs by interactions made by a user. The next section looks at the playlist generator, whose purpose it is to select a list of songs which the agent assumes the user would enjoy listening to.

10.7 Playlist generator

10.7.1 Introduction

The playlist generator uses the information gathered by the Intelligence Plug-ins and the interaction plug-ins to generate a playlist of songs for the user to listen to. The EMMA uses three key characteristics to select songs. These are: • Beat selection • Histogram matching • Rating based

These are described in detail below:

10.7.2 Beat selection

Selecting songs by average beats per minute allows for a selection of songs that have similar style. For instance, they will have the same rhythm. However, songs across all genres can be chosen, so one might select rave and then follow it with death metal. This on its own is therefore not a viable option. The EMMA selects songs based on the last song that was played all the way through. It takes the average beats per minute of that song and then selects songs within a range. The range is 5 beats per minute (BPM) either way. For example, if the previous songs which were played all the way through have an average BPM of 120, the EMMA would select songs which have a BPM of between 115 and 125. 126

10.7.3 Histogram selection

Selecting songs based on their histograms provides songs with similar styles of music. It allows, for example, for the separation of electronic music from classical music. The EMMA selects songs which have similar histograms. It does this by allowing for a variation of 2 either way on each bar. The histogram that the EMMA builds comprises a spectrum of 255 frequencies. The songs selected are then compared with those selected in the beat selection described above, and a union of the two arrays is selected.

10.7.4 Rating based

Once the EMMA has a list of viable songs, it selects ten songs from this list based on the rating of the song. To avoid choosing the same songs every time, the rating of the song in combination with a random number generator is used to choose songs. The random number generator is used to generate a number between 0 and 10. This number is then compared to that of the rating of the song. If the rating of the song is greater or equal to that of the generated number, it is then added to the list of songs to be played. For example, if the random number generator generates the number 8, then only songs which have an average rating of 8, 9 or 10 can be added to the play list. Similarly if the number 0 is generated, then any song which fits the specific category can be added to the play list. Once a list of 10 songs has been compiled, the list is passed to the music player to begin playing.

10.8 Conclusion

This chapter introduces the components of the EMMA. The agent is a real world example based on the model described in chapter 8. This chapter describes how

127

each of the components of the agent works, and shows how pluggable agent technology can be used to create a better working agent.

The next chapter analyses the EMMA based on measures provided by other researches.

128

11 Analysis of the EMMA

11.1 Introduction

This chapter provides an analysis of the EMMA and provides detail of how the agent perceives and acts upon its environment

11.2 Motivation and description of the EMMA

The motivation behind the proposed agent is to provide a working example of an agent that incorporates multiple intelligence schemes. These intelligence schemes should be easily interchangeable, and the agent should be able to choose a suitable scheme at runtime.

The agent described in chapters 9 and 10 is a music agent. Its purpose is to provide a user or a music player with a list of songs which are similar. Different intelligence schemes will be used to analyse the music files, depending on how much processing power is available for the agent to utilise.

11.3 The PEAS description of the agent

Russell and Norvig present a method of describing an agent by means of a table [Rus03]. A PEAS table provides a means for describing key parts of an agent’s domain.

129

Performance Environment Actuators Sensors Measure Accuracy of Music files, Intelligence Music file choice, speed user input. selector analysis, input of analysis devices

Table 11.1 – PEAS table

Performance Measure: Accuracy of the agent’s choice of music is the most important measure for the agent. If it keeps selecting music which the user does not like then it is not performing very well. However, the speed of analysis is also very important. The agent would not be very useful if it took 24 hours to analyse each song. However, the speed of the analysis is indirectly proportional to the accuracy of the results, so a balance needs to be found.

Environment: The agent will be required to operate on a range of different music files, as well as interpret user input. The agent will also be required to operate in different processor environments, ranging from lower power devices such as music players to high power devices such as PC’s.

Actuators: The agent will have the ability to select a specific intelligence model depending on its environment. It will be able to determine the processing power of its environment by either running tests, or directly querying the information from the .

Sensors: The agent will sense its environment by analysing music files and by listening to keyboard and mouse input.

130

11.4 Properties of the task environment

Another method of analysing an agent, provided by Russell and Norvig [Rus03], is to analyse the properties of the task environment of the agent. The analysis for the task environment of the agent follows:

• Fully observable: At any point in time the agent has all data that it requires to make a decision • Stochastic: The next state depends completely on user interaction. One cannot determine with full certainty whether a user will want to listen to a particular song. • Sequential: Song selection is based on user interaction, whether songs played previously were listened through or whether they were skipped. • Dynamic: The environment can change, a user’s mood can change, the user might decide to listen to a different line of music, or new music might be added to the agent’s database. • Discrete: The agent is limited in the number of choices it can make. • Single Agent: The agent operates by itself. Although other agents might communicate with it, it is not reliant on information that can be gathered from them.

131

11.5 Conclusion

This chapter analyses the real world agent, known as the EMMA which was created and described in chapters 9 and 10. Its main purpose was to demonstrate a plug-in for agent intelligence.

The agent was created to be a music agent which attempts to select songs which the current user might enjoy listening to. The plug-in nature of the intelligence components of the agent allows it to be modified to work in a number of environments and to be continually updated.

132

12 Operation of the EMMA

12.1 Introduction

This chapter describes the functionality of the EMMA prototype. The chapter describes the functionality of the different components of the EMMA described in chapters nine and ten.

12.2 The Main Agent Window

The EMMA requires the use of Nullsoft’s Winamp media player. Upon start-up, a check is made to see whether or not Winamp is running. If not, it attempts to start the application. Figure 12.1 shows the main window and Winamp on start-up.

Once the EMMA has been started, it polls Winamp every 500ms to get information on what song is currently playing.

12.3 The User Interaction Module

Once the EMMA has information on what song is being played, it compares this to the last song that was playing. If this is different, then it knows that the song has changed. The EMMA then reads the tag of the song to determine whether or not information such as the beats per minute is kept in the song tag. If so, it checks its database, and stores such information in the database, thereby avoiding having to re-profile aspects of a song.

133

Figure 12.1 - The EMMA and Winamp on Start-up

12.4 Selecting Components

The EMMA allows for some of its components to be changed at run-time. Selecting the “Plugin Chooser” option from the “Select” menu allows the user to modify the intelligence used to profile songs. Figure 12.2 shows the plug-in chooser window. The Plugin Chooser scans the working directory of the application for any DLL files. Once it has a list, it attempts to load the DLL file against the IHost interface. If it succeeds, then it lists the plug-in as an option to load.

134

Figure 12.2 - The Plugin Chooser

The current implementation of the EMMA provides two plug-in choosers. The first, named the Automatic Chooser, will scan the working directory of the application for any available plug-ins and will automatically choose one depending on the processing power that it has available to it. The second, namely the Manual Chooser, allows the user to decide which intelligence scheme to use.

135

12.4.1 The Automatic Chooser

Figure 12.3 - The Automatic Chooser

Selecting the automatic chooser and clicking “Load”, loads the plug-in into the EMMA. The EMMA then executes an initialize call to the Automatic Chooser, which in turn scans the working directory much the same way as the plug-in chooser does. The Automatic Chooser, however, attempts to load DLL files against the IPlugin interface. 136

As the Automatic Chooser successfully loads DLL files, it queries the methods that each of these plug-ins use to profile a song (using the analysisMethods() function). The Automatic Chooser then queries the Intelligence Plug-in on what it deems the worth of each of these methods are. The Automatic Chooser adds all these values together, and then chooses the Intelligence Plug-in that is closest to, but not exceeding the target worth.

12.4.2 The Manual Chooser

Figure 12.4 - The Manual Chooser

Selecting the manual chooser brings up a window that prompts you to select an intelligence scheme. Once the plug-in is chosen, the Intelligence Plug-in is loaded into the EMMA.

12.5 Song Rating

The EMMA will automatically modify the rating of a song should it be skipped or played all the way through. Figure 12.5 shows the EMMA modifying the rating of

137

a song that was played all the way through, and one that was skipped at the start.

Figure 12.5 - Rating Changes

The effect on the database is shown in Figure 12.6, where the last actions taken on the song were to play the song all the way through, listen to eighty percent of the song and skip the song at the beginning. As the song has only been played 3 times, it only has three values stored for the rating. Its average rating, is however 6 ((10+8+0) / 3) (as explained in section 10.6.3).

Figure 12.6 - Song rating in the database

12.6 Playlist generation

The EMMA will generate a playlist depending on the current song selected or playing in Winamp. Figure 12.7 shows a playlist of five songs generated from the current song that was playing in Winamp.

138

Figure 12.7 - EMMA Playlist selection

12.7 Conclusion

This chapter describes the operation of the EMMA prototype as described in chapters nine through eleven.

The EMMA performs quite well on small data sets (when the amount of songs in its database is less than 100) where most of the songs have been fully profiled and ratings have been set accordingly (in other words, most of the songs have been played at least once).

139

When the dataset grows large (for example, with a database of more than 1000 songs), profiling errors are more pronounced (i.e. at least one song incorrect in every play list generated (10%)). With such a large dataset, the number of songs played which the user does not enjoy listening to, is also pronounced. However, as time goes on and ratings for these songs are added (by the user skipping the song within the first 15 seconds), these songs get dropped from the play list as their average ratings don’t fall within the generated range.

140

13 Conclusion and Further Research

13.1 Conclusion

Computing processing power is ever increasing, as new methods are developed to increase the speed or the number of simultaneous calculations that a processor can make. This allows for the creation of agents that can keep on running without the need to be shut down, should some part of the agent need to be upgraded. These agents help keep other agents or software programs running without interrupting them if they were using part of the agent which does not require the upgrade.

The music industry has always been a large industry. Digital music allows for music to be obtained easily and stored in large quantities. This is evident by the sales of Apple iPods and the large increase in digital music sales reported by companies such as the RIAA. However, a music collection could easily expand to an unmanageable size. Intelligent digital music selection alleviates this problem by selecting music that the user would like to listen to.

This dissertation provids a new concept for developing agents and introduces the concept of digital music selection. These two topics are further discussed in detail.

13.1.1 Pluggable Intelligence

The goal of this dissertation is to introduce a new concept for developing agents. Pluggable intelligence allows for agents to adapt more easily to changes in their environment should they need to. Creating agents that modify their intelligence

141

is a step towards creating self-evolving agents. For example, a spam detection agent can discover new heuristics that it can use to catch spam more efficiently, or even decrease the amount of false positives.

Multi-Tier Intelligence Schemes (as described in chapter 3.3.2.2) provides an ever expanding model for pluggable intelligence. This allows an agent’s intelligence profile to increase infinitely, without the need for the main agent program to track the intelligence schemes being used. This allows an agent to grow from a simple keyword detection system to a spam prevention system incorporating everything from fuzzy keyword and sentence recognition to optical character recognition for image spam.

Pluggable intelligence was implemented in the EMMA by means of dynamic linked libraries (DLL’s) as the agent was written for the Microsoft Windows Platform. However, there are techniques available for other platforms, such as shared objects in the Unix and environments.

Pluggable intelligence for agents provides new opportunities to improve upon previous works. It saves on development costs in the long term and provides a method for other developers to integrate their software.

13.1.2 Digital Music

The agent that was created to demonstrate pluggable intelligence was created in a digital music environment. Music plays a large role in human life, as can be seen by the advancements in the field (vinyl, tapes, CDs, DVDs, iPods).

Digital music devices are becoming more and more popular. Cellular phones are able to store and playback full length songs. These devices provide for ever larger storage space and easier access to music (Apple’s iTunes online store).

142

Apple claims that its iPods are able to store ten thousand different songs. Navigating through that and creating a play list of songs that you would like to listen to can become a very laborious task. This is a reason why intelligent digital music selection will become a necessity in the near future.

This technology will no doubt be adopted by companies providing the sale of individual songs in digital format. Selecting songs which the user might like to listen to, presents the opportunity to increase sales, as well as provide the user with a one click option of getting music which he/she will enjoy listening to.

13.2 Further Research

This section provides information on areas which can be improved upon.

13.2.1 Module loading

The EMMA relies on dynamic linked library files that inherit from a central file which includes all interfaces that the agent uses. A next step would be to investigate the creation of files which could be loaded depending on structures that they themselves define. This would thereby eliminate the dependency on the interfaces DLL. Such definitions could for example be created as an extension to XML.

13.2.2 Analytical techniques

The field of digital signal processing is quite broad and new analytical techniques are proposed quite regularly. This dissertation only introduces some concepts that have proven results. There are, however, other analytical techniques that are

143

still in discovery phase or have non-consistent results. Implementing these might provide for more accurate music selection.

13.2.3 User interaction

User interaction plays a vital role in determining the correct selection of music. Study into human psychology may provide for more information into why humans make certain decisions. This may provide the agent with a better understanding on how to better rate and select music which the user may want to listen to.

13.2.4 Play list generation

The play list generation method can be improved to incorporate more data generated from the analysis modules. This will improve the hit rate on successful songs selected.

13.3 Final Word

Digital music is currently gaining huge momentum. With new versions of Apple’s iPod being released often, as well as competitors such as Microsoft entering the market, high capacity digital music players are becoming cheaper and cheaper. This has led to a greater uptake of digital music from the general public. This has also spurred a decrease in the cost of buying music as people now have the choice of purchasing just the songs they want instead of the entire album.

Even though there has been research into digital music analysis techniques, much of it has been adopted from techniques that have been used in other signalling applications. However, research into the field seems to be on the increase.

144

Pluggable intelligence creates a new breed of agent. Even though the idea has been around for some time, the processing power available to agents was generally not great enough to warrant pluggable intelligence schemes at run- time. The Agent Academy is a prime example of software that has existed that allowed for intelligence schemes to be plugged in during design time of an agent.

However, as personal computers have a lot more processing available to them, it opens the door to pluggable intelligence. Even further, it allows for a design where an agent can modify its intelligence depending on its situation and needs. This means that an agent could for example, come across a document that it cannot read (for example a PDF file). It could then search a repository of agent intelligence schemes for one that could read such files, and thus, return with the Adobe Acrobat Reader intelligence scheme. Once it had plugged this in, it could decipher the file and continue with what it was searching for.

145

Bibliography

[Age05] The European Co-Ordination Action for Agent Based Computing. http://www.agentlink.org/about/faq.html. Accessed on 2005-05-15

[Age07a] AgentBuilder Product Description. http://www.agentbuilder.com/Documentation/product.html. Accessed on 2007-08-10.

[Age07b] AgentBuilder: Agent Applications. http://www.agentbuilder.com/AgentTechnology/agentApplications.ht ml. Accessed on 2007-08-10.

[Alek03] Markus Aleksy and Axel Korthaus and Martin Schader. CARLA - A CORBA-based Architecture for Lightweight Agents. Intelligent Agent Technology, 2003. IAT 2003. IEEE/WIC International Conference Page 111. October 2003

[Alex02] Peter J. Alexander. Peer-to-Peer File Sharing: The Case of the Music Recording Industry. Review of Industrial Organization. Volume 20, Number 2. Pages 151 – 161. March, 2002

[Amm02] Ayman Ammoura and Franco Carlacci. Ogg Vorbis and MP3 Audio Stream characterization. University of Alberta. Department of Computing Science. Apr 2002

[App06a] Apple Computer Inc. AAC Audio – Small files. Large Sounds. http://www.apple.com/quicktime/technologies/aac/. Accessed on 5 July 2006.

I

[App06b] Apple Computer Inc. ITunes Customer Service, http://www.apple.com/uk/support/itunes-ie/authorization.html. Accessed on 26 November 2006.

[Art06] Artistled - Classical music's first musician-directed, internet-based recording company. http://www.artistled.com. Accessed on 10 Sep 2006.

[Ath03] N. Athanasiadis_ & P. A. Mitkas & G. B. Laleci & Y. Kabak Embedding data–driven decision strategies on software agents: The case of a Multi–Agent System for Monitoring Air–Quality Indexes. ISBN 90 5809 622 X pages 23-30. 2003

[Aut04] Henri Autti and Johnny Biström. Mobile Audio – from MP3 to AAC and further. Helsinki University of Technology. Telecommunications Software and Multimedia Laboratory T-111.550 Multimedia Seminar: Mobile Multimedia Application Platforms. 2004.

[Ber01] A. L. Berenzweig and D. P. Ellis. Locating singing voice segments within musical signals. In Proc. Int. Workshop on Applications of Signal Processing to Audio and Acoustics WASPAA, pages 119– 123, Mohonk, NY, 2001.

[Bob06] Bobulous Central. Lossy audio formats comparison, Audio formats comparison 2006. http://www.bobulous.org.uk/misc/lossy_audio_2006.html. Accessed on 23rd November 2006

[Bor04] Borland, J. Microsoft’s iPod Killer? CNET News, April 2, 2004,

II

http://news.com.com/2100-1027_3-5183692.html. Accessed on 25 April 2005

[Bor05] Borland, J. ITunes outsells traditional music stores. CNet News. http://news.com.com/iTunes+outsells+traditional+music+stores/210 0-1027_3-5965314.html. Nov 2005. Accessed on 26 November 2006.

[Bra01] Frances M.T. Brazier and Niek J.E. Wijngaards. Designing Self- Modifying Agents. Computational and Cognitive Models of Creative Design V, pp. 93-112, December, 2001

[Bra02] Frances M.T. Brazier and Niek J.E. Wijngaards. Automated (RE-)Design Of Software Agents. Proceedings of the Artificial Intelligence in Design Conference (AID2002), Gero, J.S. (editor), pp. 503-520, Kluwer Academic Publishers, 2002

[Bro92] J Brown and M Puckette. An efficient algorithm for the calculation of a constant Q transform. The Journal of the Acoustical Society of America vol. 92, no. 5, pp. 2698-2701, 1992.

[Bus99] P. Busetta and R. Ronnquist and A. Hodgson and A. Lucas. Jack intelligent agents - components for intelligent agents in java. AgentLink News Letter, January 1999.

[Cam00] Emilios Cambouropoulos. Extracting ’Significant’ Patterns from Musical Strings: Some Interesting Problems. Invited Talk presented at London String Days (LSD2000), King's College London, U.K. April 2000

III

[Cam03] Emilios Cambouropoulos. Musical Pattern Extraction for Melodic Segmentation. In Proceedings of the Fifth Triennial ESCOM conference, Hanover, Germany. 2003

[Cam04] Emilios Cambouropoulos and C. Tsougras. Influence of Musical Similarity on Melodic Segmentation: Representations and Algorithms. In Proceedings of the International Conference on Sound and Music Computing (SMC04), Paris, France 2004

[Col02] Rem William Collier. Agent Factory: A Framework for the Engineering of Agent-Oriented Applications. Phd Thesis. University College Dublin, Belfield, Dublin 4, Dec 2002

[Coo01] Perry Cook, Georg Essl and George Tzanetakis. Audio Analysis using the Discrete Wavelet Transform. In. Proc. WSES Int. Conf. Acoustics and Music: Theory and Applications (AMTA 2001) Skiathos, Greece, 2001

[Coo02] Perry Cook, Andrey Ermolinskyi and George Tzanetakis. Pitch Histograms in Audio and Symbolic Music Information Retrieval. In Proc. Int. Conference on Music Information Retrieval (ISMIR), Paris, France, October 2002

[Des94] P. Desain and H. Honing. Foot-Tapping: a brief introduction to beat induction. In Proceedings of the 1994 International Computer Music Conference. 78-79. San Francisco: International Computer Music Association. 1994

IV

[Dig97] Frank Dignum and Rosaria Conte. Intentional Agents and Goal Formation. Lecture Notes In Computer Science; Vol. 1365. Proceedings of the 4th International Workshop on Intelligent Agents IV, Agent Theories, Architectures, and Languages. Pages: 231 – 243. 1997

[Dil99] P. Dillenbourg. What do you mean by collaborative learning?. In P. Dillenbourg (Ed) Collaborative-learning: Cognitive and Computational Approaches. (pp.1-19). Oxford: Elsevier. 1999.

[Dis04] Sascha Disch, Christian Ertel, Jürgen Herre, Johannes Hilpert, , Andreas Hoelzer, Karsten Linzmeier and Claus Spenger. An Introduction To MP3 Surround. Fraunhofer Institute for Integrated Circuits IIS, 91058 Erlangen, Germany. Evaluation of the Fraunhofer IIS MP3 Surround Software. December 02, 2004

[Dix01a] Simon Dixon. A Lightweight Multi-agent Musical Beat Tracking System. Austrian Research Institute for Artificial Intelligence, Schottengasse 3, A-1010 Viennam, Austria. Sep 2001

[Dix01b] Simon Dixon. An Empirical Comparison of Tempo Trackers. 8th Brazilian Symposium on Computer Music, Fortaleza, Brazil, pp 832-840, August 2001

[Dix01c] Simon Dixon.Automatic Extraction of Tempo and Beat from Expressive Performances. Journal of New Music Research, 30 (1), pp 39-58. Aug 2001

[Dix02] S. Dixon. An Interactive Beat Tracking and Visualization System. In Proc. Int. Computer Music Conf. (ICMC), pages 215–218, Habana, Cuba, 2002. ICMA. V

[Dus04] Dustin Lang and Nando de Freitas. Beat Tracking the Graphical Model Way. Neural Information Processing Systems, NIPS 2004, December 13-18, 2004

[Fin94] T. Finin, R. Fritzon, D McKay, R McEntire - KQML as an Agent Communication Language, 3rd International Conference on Information and Knowledge Management (CIKM94), ACM Press, December 1994

[Fis00] William Fisher. Digital Music: Problems and Possibilities. http://www.law.harvard.edu/faculty/tfisher/Music.html. Accessed on 24 Aug 2006.

[Fon99] Lenny Foner. What's an Agent? http://foner.www.media.mit.edu/people/foner/agents.html. Last modified on 22-01-1999, Acessed on 14-05-2005

[Fra96] Stan Franklin and Art Graesser. Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, Pages 21 – 35. 1996.

[Fra06] Fraunhofer IIS. MP3: MPEG Audio Layer-3. http://www.iis.fraunhofer.de/amm/techinf/layer3/. Accessed on 28 Sep 2006.

[Fre03] David Frerichs, New MPEG-4 High-efficiency AAC Audio: Enabling new applications. http://www.telos-ystems.com/techtalk/hosted/m4- in-30100%20(M4IF_HE_AAC_paper).pdf. April 2003. Accessed on 14 July 2006. VI

[Fri06] Ina Fried, Daniel Terdiman. Microsoft’s Zune to Rival Apple’s iPod. http://news.com.com/Microsofts+Zune+to+rival+Apples+iPod/2100- 1041_3-6097196.html. Accessed on 25 September 2006

[Gen94] Michael R. Genesereth and Steven P. Ketchpel. Software Agents. Communication of the ACM, Vol. 37, No. 7. Pages: 48 – ff. July 1994

[Ger02] David Gerhard. Computer Music Analysis. Simon Fraser University School of Computing Science Technical Report CMPT TR 97-13. July 2002

[Ger03a] David Gerhard. Audio Signal Classification: History and Current Techniques. Technical Report TR-CS 2003-7, University of Regina Department of Computer Science, November 2003

[Ger03b] David Gerhard. Silence as a Cue to Rhythm in the Analysis of Speech and Song. Journal of the Canadian Acoustical Association, 31:3 p22-23. 2003

[Goo95] Richard Goodwin. Formalizing Properties of Agents. Journal of Logic and Computation 5(6):763-781. 1995

[Got99] Masataka Goto and Yoichi Muraoka: Real-time Beat Tracking for Drumless Audio Signals: Chord Change Detection for Musical Decisions, Speech Communication, Vol.27, Nos.3-4, pp.311-335, April 1999.

VII

[Gro95] Benjamin N. Grosof, David W. Levine, Hoi Y. Chan, Colin J. Parris, and Joshua S. Auerbach. Reusable Architecture for Embedding Rule-based Intelligence in Information Agents. Proc. of the Workshop on Intelligent Information Agents, ACM Conf. on Information and Knowledge Management (CIKM-95), Dec. 1995.

[Hen00] James Hendler and Deborah L. McGuinness, “The DARPA Agent Markup Ontology Language”, In IEEE Intelligent Systems Trends and Controversies, 2000 http://ksl.stanford.edu/people/dlm/papers/ieee-trends-daml-final- version.doc. Accessed on 27 Sep 2006.

[IBM05a] IBM Corporation. http://www.research.ibm.com/iagents/. Accessed on 28-06-2005.

[IBM05b] IBM Corporation. http://alphaworks.ibm.com/tech/commonrules. Accessed on 28-06-2005

[IFP04] IFPI. IFPI Sales Report 2004. http://www.ifpi.org/site- content/library/worldsales2004.pdf. Accessed on 29 Aug 2006.

[IFP05] IFPI. IFPI Sales Report 2005. http://www.ifpi.org/site- content/library/worldsales2005-ff.pdf. Accessed on 29 Aug 2006.

[Int03] Intel Corporation. Xinmin Tian Yen-Kuang Chen Girkar, M. Ge, S. Lienhart, R. Shah, S. Exploring the use of Hyper-Threading technology for multimedia applications with Intel/spl reg/ OpenMP compiler. Parallel and Distributed Processing Symposium, 2003. Proceedings. International pp 8. April 2003

VIII

[Jen95] Nicholas R. Jennings and Michael J. Wooldridge. Agent Theories, Architectures, and Languages: A Survey. Proceedings of the workshop on agent theories, architectures, and languages on Intelligent agents. Amsterdam, The Netherlands. 1995

[Jen98] Nicholas R. Jennings, Katia Sycara and Michael Wooldridge. A Roadmap of Agent Research and Development. Autonomous Agents and Multi-Agent Systems, Volume 1 , Issue 1, Pages: 7 – 38. 1998

[Jen00a] N. R. Jennings. On Agent-Based Software Engineering. Artificial Intelligence 117 (2000) 277–296, Department of Electronics and Computer Science, University of Southampton, UK, 2000

[Jen00b] Nicholas R. Jennings and Michael J. Wooldridge, Agent-Oriented Software Development. Proceedings of the 9th European Workshop on Modelling Autonomous Agents in a Multi-Agent World: Multi-Agent System Engineering ({MAAMAW}-99) pg 1-7, 2000

[Kotz99] David Kotz and Robert S. Gray. Mobile Agents and the Future of the Internet. ACM Operating Systems Review, 33(3) pages 7-13. August 1999

[Kun02] An efficient k-means clustering algorithm: Analysis and implementation, T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu, IEEE Trans. Pattern Analysis and Machine Intelligence, 24 (2002), 881-892.

IX

[Lea07] Artificial Neural Networks http://www.learnartificialneuralnetworks.com/. Accessed on 4 November 2007

[Lib97] Henry Lieberman. Autonomous Interface Agents. Proceedings of the SIGCHI conference on Human factors in computing systems. Atlanta, Georgia, United States. Pages: 67 - 74 1997.

[Luc03] Michael Luck, Peter McBurney and Chris Preist. Agent Technology: Enabling Next Generation Computing: A Roadmap for Agent-Based Computing Version 1.0. http://www.agentlink.org/roadmap/al2/index.html. Jan 2003

[Luc04] Michael Luck, Peter McBurney and Chris Preist. Agent Technology Roadmap: Overview and Consultation Report. http://www.agentlink.org/roadmap/download.html. Dec 2004

[Mae94] Pattie Maes. Modeling adaptive autonomous agents. Artificial Life Volume 1 , Issue 1-2. Pages: 135 – 162. 1994

[Mag05] William Magro, Paul Petersen, Sanjiv Shah. Hyper-Threading Technology: Impact on Compute-Intensive Workloads. http://www.intel.com/cd/ids/developer/asmo-na/eng/20442.htm. Accessed on 30/05/2005.

[Mar01] Sylvain Marchand. An Efficient Pitch-Tracking Algorithm Using A Combination Of Fourier Transofrms. Proceedings of the COST G-6 Conference on Effects (DAFX-01), Limerick, Ireland, pages 170 – 174. December 6-8,2001

X

[Mar05] Duncan Martell (Reuters). Lawsuit Claims Apple Violates Law with iTunes. http://www.findarticles.com/p/articles/mi_zdewk/is_200501/ai_n859 5680 . Jan 2005. Accessed on 26 September 2006.

[McL03] Mclvor, C. McLoone, M. McCanny, J.V. Fast Montgomery modular multiplication and RSA cryptographic processor architectures. Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers. Volume: 1 pages 379- 384. Nov 2003

[Mic05] Microsoft Corporation. Microsoft Windows Media http://www.microsoft.com/windows/windowsmedia. Accessed 15 May 2005

[Mid00] S.E. Middleton. Interface Agents: A review of the field. Technical Report ECSTRIAM01-001, University of Southampton, Aug 2000.

[Moo05] Moodlogic Forums. http://forums.moodlogic.net/thread.jsp?forum=7&thread=2341&start =15&msRange=15. Accessed on 13 Aug 2005

[Mus06] MusicBrainz. http://musicbrainz.org/doc/ClassicTagger. Accessed on 28 December 2006.

[Mus07] MusicBrainz Picard. http://musicbrainz.org/doc/PicardTagger. Accessed on 05 November 2007.

[Nec94] Robert Neches. The Knowledge Sharing Effort. http://www- ksl.stanford.edu/knowledge-sharing/papers/kse-overview.html. Accessed on 19-06-2006. XI

[Nil05] Martin Nilsson. The short history of tagging. http://www.id3.org/history.html. Accessed on 14 May 2005

[Nol02] Stefano Nolfi. Power and the limits of reactive agents. Neurocomputing Volume 42 Number 1 - 4, pages 119-145. January 2002.

[Nwa96] Hyacinth S. Nwana. Software Agents: An Overview. Knowledge Engineering Review, Vol. 11, No 3, pp. 205-244, Nov 96.

[Ogg01] Ogg Vorbis Team. Ogg field specification. Web notes from the developers. http://www.xiph.org/ogg/vorbis/doc/v- comment.html. Febuary 2001. Accessed on 2 July 2006.

[Ogg03] Ogg Vorbis Team. Ogg vorbis frequent asked questions. Web notes from the developers. http://www.vorbis.com/faq/. October 2003. Accessed on 2 July 2006.

[Opp71] Oppenheim, A. Johnson, D. Steiglitz, K. Computation of spectra with unequal resolution using the fast Fourier transform. Procedings of the IEEE Volume: 59, Issue: 2 pages 209-301. Feb 1971.

[Par05] B. Pardo and W. Birmingham. Modeling Form for On-line Following of Musical Performances. In Proceedings of the Twentieth National Conference on Artificial Intelligence pages 1018-1023, Pittsburgh, Pennsylvania, July 9-13, 2005.

[Pas06] Adam Pash (About.com). WMA (Windows Media Audio). http://mp3.about.com/od/glossary/g/wma.htm. Accessed on 26 November 2006. XII

[Pet05] Resources to Accompany Musical Analysis and Synthesis in Matlab. http://amath.colorado.edu/pub/matlab/music/. Accessed on 03-10-2005.

[Pol07] Robi Polikar. The Wavelet Tutorial Part I. http://engineering.rowan.edu/~polikar/WAVELETS/WTpart1.html. Accessed on 03-10-2007.

[Pre05] Predixis MusicMagic. http://www.predixis.com. Accessed on 25 June 2005.

[Pre07] Predixis MusicIP. http://www.musicip.com/mixer/index.jsp. Accessed on 9 August 2007.

[Rao96] Anand S. Rao. AgentSpeak(L): BDI Agents speak out in a logical computable language. Proceedings of the 7th 24 European Workshop on Modelling Autonomous Agents in a Multi-Agent World, (LNAI Volume 1038), 42-55. 1996.

[Rel05] Relatable TRM. http://www.relatable.com/. Accessed on 15 Aug 2005

[Ria05] The Recording Industry Association of America. 2005 U.S. Manufacturers' Unit Shipments and Value Chart. http://76.74.24.142/8230EB0F-3012-63C0-CCA5- AD966FAAF739.pdf. 2005. Accessed on 1 October 2006.

[Ria06] The Recording Industry Association of America. Napster Case. http://www.riaa.com/News/filings/napster.asp. Accessed on 14 Sep 2006. XIII

[Rus03] Stuart Russell and Peter Norvig. Artificial Intelligence – A Modern Approach. Prentice Hall, USR, NJ 07548. ISBN:0137903952. Published 2003.

[Sim05] Steve Simmons. Section Six: Beyond RISC - Search for a New Paradigm. http://www.sasktelwebsite.net/jbayko/cpu6.html. Accessed on 25/05/2005.

[Smi03] Smith, J.O. Mathematics of the Discrete Fourier Transform (DFT), http://ccrma.stanford.edu/~jos/mdft/, 2003, ISBN 0-9745607-0-7.

[Son04] Sonic Solutions. Napster Goes Live in the UK Today. http://www.sonic.com/about/press/news/2004/may/napster-uk.aspx. May 2004. Accessed on 26 November 2006.

[Sym02] Andreas L. Symeonidis, Pericles A. Mitkas and Dionisis D. Kechagias . Mining Patterns And Rules for Improving Agent Intelligence Through An Integrated Multi-Agent Platform. In: 6th IASTED International Conference, Artificial Intelligence and Soft Computing (ASC 2002), 17-19 Jul 2002, Banff, Alberta, Canada.

[Usd04] U.S. Department of Commerce. A Nation Online: Entering the Broadband Age. https://www.esa.doc.gov/Reports/NationOnlineBroadband04.htm. Sep 2004. Accessed on 18 June 2006.

[Via06] Via Licensing. MPEG-4 Audio Licensing FAQ. http://www.vialicensing.com/Licensing/MPEG4_FAQ.cfm?faq=6#6. Accessed on 28 December 2006.

XIV

[Whi97] James E. White. Software agents. ISBN:0-262-52234-9. 1997

XV