University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The Vault: Electronic Theses and Dissertations

2018-07-25 EEIA: The Extended Efficiency Improvement Advisor

Nygren, Nicholas

Nygren, N. (2018). EEIA: The Extended Efficiency Improvement Advisor (Unpublished master's thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/32705 http://hdl.handle.net/1880/107524 master thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

EEIA: The Extended Efficiency Improvement Advisor

by

Nicholas Nygren

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF MASTER OF SCIENCE

GRADUATE PROGRAM IN COMPUTER SCIENCE

CALGARY, ALBERTA

JULY, 2018

c Nicholas Nygren 2018

Abstract

In the past, the Efficiency Improvement Advisor (EIA) has been successfully applied to several dynamic problems. By learning recurring tasks, it was able to correct inefficient behavior in multi- agent systems. We present an extension to the advisor which allows certain known-ahead knowl- edge to be exploited. This extension unobtrusively guides autonomous agents to follow a plan, while retaining the dynamic abilities of those agents. Unlike other similar approaches which in- troduce planning functionality, this does not require always-on communications. The extended advisor’s planning abilities work in tandem with the original learning abilities to create additional efficiency gains. The abilities of the extended advisor (including the introduction of planning, the preservation of dynamism, and mixing certain knowledge with learned knowledge) are eval- uated in 2 different problem domains. First, the advisor is applied to the familiar arcade game:

Whack-a-mole. Then, Pickup and Delivery is considered, which is similar to coordinating a taxi service.

ii Table of Contents

Abstract ...... ii Table of Contents ...... iii List of Tables ...... v List of Figures ...... vi 1 Introduction ...... 1 2 Definitions and Basic Concepts ...... 4 2.1 The Problem: Dynamic Task Fulfillment ...... 4 2.2 Multi Agent Systems ...... 7 2.3 The Efficiency Improvement Advisor ...... 10 2.3.1 The extract step ...... 13 2.3.2 The optimize step ...... 15 2.3.3 The derive step ...... 17 3 The Extended Efficiency Improvement Advisor ...... 19 3.1 The Working Cycle of the EEIA ...... 20 3.2 The derive step ...... 22 3.3 Simple Example ...... 25 4 Simple Application: Whack-a-mole ...... 29 4.1 The Environment ...... 33 4.2 Dynamic Task Fulfillment Perspective ...... 35 4.2.1 Task Similarity ...... 35 4.2.2 Optimization Criterion ...... 36 4.2.3 Bounding Function ...... 38 4.3 Agents ...... 39 4.4 Required Modifications of Agents for Working with An Advisor ...... 43 4.4.1 Providing Histories ...... 43 4.4.2 Handling Rules ...... 45 4.4.3 Rule Conflicts ...... 48 4.4.4 New Agents ...... 50 5 Complex Application: Pickup and Delivery ...... 53 5.1 The Environment ...... 57 5.2 Dynamic Task Fulfillment Perspective ...... 59 5.2.1 Task Similarity ...... 60 5.2.2 Optimization Criterion ...... 60 5.3 Agents ...... 64 5.4 Required Modifications of Agents for Working with An Advisor ...... 68 5.4.1 Providing Histories ...... 68 5.4.2 Handling Rules ...... 69 5.4.3 Rule Conflicts ...... 70 6 Experiments ...... 71 6.1 Generating Base Instances ...... 72 6.1.1 Whack-a-mole ...... 72 6.1.2 Pickup and Delivery ...... 74

iii 6.2 Degree of Dynamism ...... 76 6.2.1 Whack-a-mole ...... 78 6.2.2 Pickup and Delivery ...... 81 6.3 Static Variant ...... 84 6.3.1 Whack-a-mole ...... 85 6.3.2 Pickup and Delivery ...... 86 6.4 Full Runs ...... 88 6.4.1 Whack-a-mole ...... 91 6.4.2 Pickup and Delivery ...... 93 6.4.3 More Vehicles ...... 93 7 Related Works ...... 100 7.1 Control Theory ...... 100 7.2 Whack-a-mole ...... 102 7.3 Pickup and Delivery ...... 104 7.3.1 How pre-determined are an agent’s actions? ...... 106 7.3.2 How does the system handle new requests mid-execution? ...... 107 7.3.3 What is the impact of an agent/component failing? ...... 108 7.3.4 What happens when communication fails mid-execution? ...... 109 7.3.5 Which agents require constant communication? ...... 111 7.3.6 How much variation is there in vehicle size? ...... 112 7.3.7 How can tasks be given emphasis or priority? ...... 113 7.3.8 What is the optimization criterion? ...... 113 7.3.9 What limitations are there on simultaneous jobs? ...... 115 7.3.10 What ability is there to cancel a task? ...... 116 8 Conclusion ...... 117 Bibliography ...... 121 A Optimization Methods ...... 126 A.1 Genetic Algorithm ...... 126 A.2 Branch and bound ...... 130

iv List of Tables

5.1 Customer requests made in advance, to allow planning...... 54 5.2 Based on the known-ahead requests, the driver can make a plan...... 54 5.3 In retrospect, all fares fulfilled, including dynamic ones...... 56

6.1 Advised hammer FIFO in the static variant of Whack-a-mole, quality rela- tive to a plan...... 85 6.2 Advised 2 hammer NN in the static variant of Whack-a-mole, quality relative to a plan...... 85 6.3 Average efficiency of executing plans in the static variant of PDPTW...... 87 6.4 The effect of δ on the knowledge set...... 90 6.5 Comparison of the quality (Avg. Hits better than base system) of solutions achieved by base system (single hammer FIFO), system with EIA and system with EEIA for larger values of m...... 94 6.6 Comparison of the quality (Avg. Hits better than base system) of solutions achieved by base system (multi-hammer NN), system with EIA and system with EEIA for larger values of m...... 95 6.7 Comparison of the quality (total time over all run instances of an experiment) for Pickup and Delivery with 2 vehicles and measuring efficiency by time, varying values of m...... 96 6.8 Comparison of the quality (total time over all run instances of an experiment) for Pickup and Delivery with 2 vehicles and measuring efficiency by distance, varying number of vehicles...... 99

A.1 Some parameters adjusted for problem size, where m is the number of tasks. . . . . 130

v List of Figures and Illustrations

2.1 A Multi Agent System...... 7 2.2 A Multi Agent System under advice...... 10 2.3 The actions of the original advisor, the EIA (from [SD+10])...... 11 2.4 Types of Exception Rules ...... 17

3.1 The actions of the EEIA...... 21 3.2 High-level example run instance...... 27

4.1 Example mole schedule, and comparison of hammer choices...... 31 4.2 The Whack-a-mole environments, varying sizes (are not required to be square). . . 33 fifo 4.3 The application of an ignore rule, ata j , to a queue for g ...... 46 ¬ A nn 4.4 The application of an ignore rule, ata , to a list of tasks distances for g . . . . . 47 ¬ j A 4.5 The application of an proactive rule to a queue for gfifo...... 47 4.6 The application of an proactive rule to a list of tasksA distances for gnn...... 48 4.7 An Ignore rule conflicting with a proactive rule for the same task.A This is the difference made by order of rule application...... 49

5.1 Larger cities and towns on the Trans Canada, labeled with distances separating them. 53 5.2 The plan, shown as the graph of the highway extended over the day (a space-time diagram). A blue line is used to represent the lifeline/trajectory of the vehicle. . . . 54 5.3 An example fare, illustrating the actual time window...... 55 5.4 A sequence of requests being fulfilled by a single vehicle, some known in advance, some dynamic...... 56 5.5 The standard pickup and delivery environment...... 57 5.6 The infochemical gradient. The vehicle’s infochemicals are in red, the pickup location’s infochemicals are in green, and the delivery location’s infochemicals are in blue. The depot’s infochemicals are omitted from the diagram...... 66 5.7 The application of an ignore rule to a utility calculation for gDIC...... 69 5.8 The application of a proactive rule to a utility calculation forA gDIC...... 70 A 6.1 The Degree of Dynamism experiment performing on varying problem sizes, using single hammer FIFO under advice (using Branch-and-Bound). Each figure is a single bri. Red line indicates performance without advice...... 79 6.2 The Degree of Dynamism experiment performing on varying problem sizes, using multi hammer NN under advice (using GA). Each figure is a single bri. Red line indicates performance without advice...... 80 6.3 The Degree of Dynamism experiment repeated on the 5 4, using multi hammer NN under advice (using GA), with 3 different choices of bri× . Each figure is a single bri. Red line indicates performance without advice...... 80 6.4 The Degree of Dynamism experiment performing on varying problem sizes, us- ing DIC under advice (using GA). Each figure is a single bri. Red line indicates performance without advice...... 81

vi 6.5 The Degree of Dynamism experiment repeated for 30 tasks in Pickup and Delivery with time measurement, with 5 more choices of bri. Each figure is a single bri. Red line indicates performance without advice...... 83 6.6 The coin-flipping procedure for generating run instances with partially recurring tasks...... 89

vii Chapter 1

Introduction

Both industry and games have plentiful examples of problems where action is required despite the fact that the problem itself is changing over time, or complete description of the circumstances is not initially available. Many of these problems describe machines that break unpredictably and must be repaired or replaced. Other similar problems describe services where customers continu- ally make requests, which are then satisfied by workers. The transportation industry also has many of these problems. Practically every type of transportation work has a variant where jobs must begin before all requests are received.

Many, if not all, of these problems can be described under the Dynamic Task Fulfillment (DTF) formalization [SD+10]. This is a very general and flexible method for describing problems in which knowledge is revealed over time.

Generally, solutions to dynamic problems (such as DTF problems) fall into two basic classes. The

first is Multi-Agent Systems (MAS), where a collection of usually autonomous agents is deployed

[WH04]. They complete the required work by making a sequence of locally good decisions based on the limited knowledge available. These solutions have the advantage of being highly flexible and resilient, at the cost of efficiency due to the local perspective. The second class of solutions includes those where a central optimization is done based on whatever knowledge is available. Plans like this must be re-optimized and redistributed every time new knowledge is acquired. Central re- optimization solved the inefficiency issues of local decisions, but introduces a single point of failure and a heavy reliance on communication, in addition to a heavy computational load which is not distributed.

One previous effort to capture the positive qualities of both types of solutions was the Efficiency

1 Improvement Advisor (EIA) [SD+10]. The EIA is an advice and analytics component which gathers observations from agents, compiles these observations into a broader view of the world, and makes incremental corrections (via exception rules) to the behavior of these agents. It works on autonomous agents and gives a partial efficiency gain in the direction of the centrally optimized solutions, but without becoming an essential component of the system.

In this thesis we present an extension to the EIA which allows the inclusion of certain knowledge into the existing knowledge of agent testimonies. Exploiting disparate sources of knowledge allows a further improvement on efficiency, while preserving the previous strengths of the EIA. Specifi- cally, this extension does not result in an essential component or centralized controller. Where a

MAS solution does not have a single point of failure, the addition of the EIA or this extension does not create a single point of failure. Additionally, allowing this inclusion of certain knowledge is not only a further efficiency improvement in the average case, but is also a direct competitor to the centrally optimized systems in those cases where complete knowledge of the future is available.

We call this extension the Extended Efficiency Improvement Advisor (EEIA).

To verify these qualities of the EEIA we have used 2 different concrete instantiations of DTF:

Whack-a-mole and Pickup and Delivery. The EEIA is defined at the level of DTF, which is a broad class of problems, so it is natural to question if results achieved in one setting are to be expected across all of DTF. To address this concern, an empirical evaluation in 2 different settings shows broader generality of results. Furthermore, strong and consistent results are seen using 3 different quality measures: distance, time, and /miss ratio.

The remainder of this thesis is structured as follows. Chapter 2 contains basic concepts and formal definitions for MAS, Dynamic Task Fulfillment, and the original Efficiency Improvement Advisor.

Chapter 3 gives the formal description of the main contribution of this thesis, the Extended Effi- ciency Improvement Advisor. Chapter 4 describes the game of Whack-a-mole, how this game ap- pears through the lens of Dynamic Task Fulfillment, and the dynamic solutions (Nearest-Neighbor and First-in-first-out) which will be receiving advice. Chapter 5 mirrors Chapter 4 in structure

2 while it describes an important industrial application, Pickup and Delivery. The dynamic solution receiving advice for this problem is Digital Infochemical Coordination. Chapter 6 describes exper- iments performed in both Whack-a-mole and Pickup and Delivery, giving empirical evidence of the

EEIA’s abilities. Chapter 7 covers the related works. Finally, Chapter 8 contains the conclusions and possible future works.

3 Chapter 2

Definitions and Basic Concepts

In this chapter, we will formally introduce the general problem we are interested in: Dynamic

Task Fulfillment. It is a concept on a level high enough to encompass both real world industrial problems and games. Since the other concepts discussed in this thesis are meant to be as broadly applicable as possible, they will also be discussed at this level of abstraction.

A frequently chosen approach to these type of problems is using Multi Agent Systems. Multi

Agent Systems (MAS) are technically general enough to describe any approach, and also offer a formalism and point of view which are extremely useful for dynamic settings.

Finally, the Efficiency Improvement Advisor concept will be given as a enhancement for MAS when applied to Dynamic Task Fulfillment in general. Combined, these three concepts form the foundation for the subject of this thesis.

2.1 The Problem: Dynamic Task Fulfillment

In Dynamic Task Fulfillment (DTF) [SD+10] a set of agents, A, is responsible for solving an ordered set of tasks, T, with the added difficulty that these tasks are not necessarily known in advance and will only be revealed over time. This sequence of tasks, which are revealed to the agents, is called a Run Instance. Intuitively, a Run Instance can be thought of as a single day of work. Typically, the expectation is that the agents will solve every task in the Run Instance, but this depends on how the success of the agents is measured. Expanding on this analogy, a sequence of Run Instances (or days), is called a Run, and can be used to represent a week or a month of work. Representing problems in a sequence this way offers the possibility for patterns to exist over

4 longer time periods, and for learning to be done on those patterns.

Formally, each run instance occurs in a discrete time interval:

Time = [0,end] Z ⊆ where the end depends on the run instance and the definitions of the specific problem domain.

Definition 2.1.1 (Run Instance). A Run Instance is a m-length list of Events,

ri j = (ev1,ev2,...,evm).

where,

Definition 2.1.2 (Event). An Event is a task/time pair evi = (tai,ti) where ti Time indicates when ∈ the task is announced.

and,

start end start end Definition 2.1.3 (Task). A Task, tai = (detaili,t ,t ),t ,t Time, contains a time win- i i i i ∈ dow in which the task must be fulfilled, and other problem dependent attributes in detaili.

Combining this, a run instance, ri, will have the form:

ri = ((ta1,t1),(ta2,t2),...,(tam,tm))

Finally,

Definition 2.1.4 (Run). A Run is a k-length list of Run Instances, run = (ri1,ri2,...,rik).

Where a run, r, will have the form

r = (((ta1,1,t1,1),(ta2,1,t2,1),...,(tam1,1,tm1,1)), ...,

((ta1,k,t1,k),(ta2,k,t2,k),...,(tamk,k,tmk,k))).

The responses of the agents in A to the events in a run instance result in an emergent solution sol

of the run instance:

Definition 2.1.5 (Solution). A Solution is a sequence of assignments, sol = (as1,...,asm), describ- ing which agents, in A, uniquely fulfill which tasks, in T.

5 where,

0 0 0 0 Definition 2.1.6 (Assignment). An Assignment is a , asi = (ta , g ,t ), which means task ta i A i i i was started by g0 at time t0. A i i Combining these definitions, a solution will have the form:

sol = ((ta0 , g0 ,t0 ),(ta0 , g0 ,t0 ),...,(ta0 , g0 ,t0 )) 1 A 1 1 2 A 2 2 m A m m 0 0 0 0 0 0 0 where ta ta1,...,tam , ta = ta for all i = j, g A, t t , t Time. m ∈ { } i 6 j 6 A i ∈ i ≤ i+1 i ∈ To determine the efficiency of the agents in A in solving a run instance, we need to associate with each solution sol a quality measure qual(sol).

Definition 2.1.7 (Solution Quality). The quality of a solution is a function: qual : Solution R, → used as optimization criterion.

The convention is that a lower qual(sol) indicates a more efficient solution. And where solopt is the optimal solution, qual(solopt) qual(sol), for any other solution. It should also be expected ≤ that very similar solutions will have very similar efficiencies (most of the time). Beyond these rough guidelines, there is enormous flexibility in the choice of the qual function.

It should be noted that these definitions leave many design choices open:

Are time windows open or closed? • Can more than one agent fulfill a task? • Can an agent fulfill more than one task? • Many similar questions could also be asked. The important thing to realize is that any combination of answers to these questions can be represented within the Dynamic Task Fulfillment structure.

We do not acknowledge these constraints at this level of abstraction. The responsibility for en- forcing these things would normally be left to the solution quality measure, as they are all problem domain specific. We have chosen to enforce a condition of “no more than one assignment per task” within our optimizers.

6 2.2 Multi Agent Systems

The foundational concept, which Dynamic Task Fulfillment and the majority of other concepts in this thesis are built upon, is Multi Agent Systems [SD+10]. These are systems where a task is given to a collection of separate but interacting components called agents. These agents usu- ally make simpler local decisions, but in some exceptional cases can have a global perspective.

They can be deterministic, random, or intelligent. They can be controlled centrally, but are of- ten autonomous. Sometimes communication between agents is available at all times, sometimes only partially available. Sometimes it comes at a cost, and sometimes communication is forbidden completely.

g g g ... g A 1 A 2 A 3 A K

nv E

Figure 2.1: A Multi Agent System.

It is sometimes the case that no individual agent is capable of completing the required work, but the system as a whole can achieve it. Alternatives are problems where a single agent could solve the entire problem, but several cooperating agents can solve it faster. This leaves a choice of how agents organize themselves, with or without communication.

What follows is a very general definition for agent [Hud11], which could likely be used in any application, and any of the cases described above:

Definition 2.2.1 (Agent). An Agent g = (Sit,Act,Dat, fAg), where: A Sit - set of possible situations the agent can perceive (from its environment) • Act - set of possible actions the agent can perform •

7 Dat - set of possible value combinations of the agent’s internal data areas •

fAg : Sit Dat Act - The agent’s decision function • × → And, many of these agents working together is formally defined as:

Definition 2.2.2 (Multi Agent System). A Multi Agent System is a pair (A, nv) where: E

A - is the set of all agents g1, g2,..., gK . • {A A A } nv - set of possible states of the environment. •E A set of agents interacting with their environment is illustrated in Figure 2.1.

While potentially any computational problem could be addressed in this way, this formalism is particularly good for approaching dynamic problems, such as Dynamic Task Fulfillment prob- lems. It has lower communications requirements, is more fault-tolerant, and is easier to scale

[WH04].

In most practical cases of Dynamic Task Fulfillment run instances, there will be events (tai,ti) and (ta j,t j) where ti < t j. The implication is that not all information will be made available at once, and certainly not at the beginning of the time interval. Approaches to static variants of these problems are no longer useful as they require full knowledge to generate a full solution. Static, in this context, refers specifically to those problems where all knowledge is available in advance (also known as a priori optimization [BC+07]).

Leaning on transportation for examples of Dynamic Task Fulfillment problems, it is considered a basic strategy (in [BCL10]) to apply one of the aforementioned static approaches, and simply re- optimize and re-distribute each time new tasks are revealed. Some authors ([MS+10], [DPH07]) describe this as non-agent-based, however it is better to refer to these as cases which are based on agents that are non-autonomous, as it will help keep the various downsides in mind:

1. Optimization is computationally expensive; re-optimization is often equally expen-

sive.

8 2. Redistributing plans may happen very frequently, and be very expensive in terms of

communication.

3. Unless every agent is receiving a revised plan, no agent should. Chaos would result

if agents were following incompatible plans. This requires very reliable communi-

cations.

The alternative to iterating a static strategy is to use a strategy where agents are autonomous and

self-organizing. One such strategy (which could possibly be applicable across all of Dynamic

Task Fulfillment) is the Contract Net Protocol [Sm80]. The Contract Net Protocol works as an

auction simulation. Each time a task is announced agents calculate their own estimated cost of

being assigned the task and use this value to determine a bid. The winner of the auction is offered

the assignment (which it may reject, triggering a second round of bidding).

This and other dynamic strategies can often mitigate or entirely eliminate the concerns above.

1. The computational load is distributed over many “processors”.

2. Communication failures and other unexpected problems will often be compensated

for by the normal behavior of the agents.

While this sounds attractive, there is a trade-off faced when choosing these strategies. The purely dynamic strategies with autonomous agents rely on locally good decisions, which in general pro- duce much less efficient solutions than the centralized, static strategies.

In a way that is analogous to the iterative re-optimization (which adapted static approaches to work on dynamic problems), there are also counters to the shortcomings of dynamic approaches. It is possible to adapt a dynamic approach, improving its efficiency at least partially toward that of a static approach. This is where the Efficiency Improvement Advisor comes into play.

9 2.3 The Efficiency Improvement Advisor

The EIA is an advice and analytics component [SD+10]. It is positioned so that it does not require any far reaching vision, nor does it require the ability to communicate over any long distance. All information about what has taken place in the environment comes post-mortem. Other agents in the system deliver their histories to the EIA at the end of each day (Run Instance), when they return home (or when communications become available). Agents interacting with their environment under advice from the EIA is illustrated in Figure 2.2.

gEIA A

g g g ... g A 1 A 2 A 3 A K

nv E

Figure 2.2: A Multi Agent System under advice.

Similarly any instruction given to other agents by the EIA is done prior to deployment, when they are near or communication is available. It is essential to the concept of the EIA that it remain a non-vital component of the system. If the system did not have a single point of failure before the

EIA was added, it should not have one after.

Any repetition occurring sufficiently often that can be detected in the histories provided by agents is knowledge for the EIA. This knowledge is used to decide what instructions to give, and what corrections to make in agent behaviors. Corrections are then done by providing the agents with exception rules (see Section 2.3.3).

The 6 main actions of the EIA are illustrated in Figure 2.3. The EIA will gather knowledge about the world (not directly though, it is provided by other agents). This knowledge is processed into

10 Advisor

Extract Optimize recurring tasks solution of Transform from global recurring Derive local agent histories history tasks rules from into global history optimal solution

Data model (advisor states, agent knowledge, Receive environment knowledge, rule sets, Send local agent intermediate results, ...) derived rules histories to agents

histories rules

Basic MAS

Figure 2.3: The actions of the original advisor, the EIA (from [SD+10]). a plan, and converted into exception rules, which are then distributed to the agents before they are deployed.

Each step is described in more detail:

receive - Collect a individual history Hi from each agent under advice. Hi will • contain every observation and action made during the prior run instance, as well as

internal states and individual decisions.

transform - Takes the local histories, H1,...,Hn, of each agent and combines the • local histories into a Global History GHist, essentially a sequence of run instances.

extract - distills from GHist, the recurring knowledge set: (tarec,...,tarec) (see • 1 p Section 2.3.1).

0 0 0 optimize computes an optimized solution optrec = ((tarec, g rec, t rec),..., (tarec, g rec, • 1 A 1 1 p A p 0 0 0 t rec)), g rec A, t rec Time, for the recurring tasks found in the previous step. p A j ∈ j ∈

11 If the previous emergent solution, last, does not have sufficient room for improve-

ment, gEIA will not create new rules. A This is decided by a constant ratio, qualthresh. Formally, the rules are created if

qual(last)/qual(optrec) > qualthresh.

derive creates for each agent gi a set Ri of exception rules, if the agents did not • A perform well. Note that an Ri can also be empty.

send communicates the set Ri to gi for each agent the next time communication • A with it is possible.

These steps will be executed after each run instance in a run, in preparation for the next, the implication being that this is done during the vehicles’ down time. Since it is not being done concurrently with the normal work, the agents should be available for communication (even though they may not be available while fulfilling tasks). This one of 4 main requirements of the EIA

[Hud11]:

1. Agents must be able to communicate (histories and rules) with the EIA between run

instances.

2. Agents must be modified to interpret and follow exception rules.

3. Runs must contain a subset of recurring events.

4. Recurring event-sets in runs must be constant (over long enough intervals) to allow

incremental correction.

The receive and send steps are simply communications between the EIA and other agents; they are no more complicated than exchanging information. The transform step is also very simple, the feedback from each agent is compiled into a record of everything that has happened in the run so far.

The remaining 3 steps are complicated enough to be described separately.

12 2.3.1 The extract step

The extract step is when the EIA’s learning occurs. This involves conversion of the histories gath-

ered in the receive and transform steps into a list of tasks. This step is responsible for detecting any task which occurs sufficiently often in the run. These tasks are predicted to occur in the near future run instances, and we refer to them as recurring tasks.

Any algorithm (clustering or otherwise) used to this end will have the following form in relation to Dynamic Task Fulfillment:

Definition 2.3.1 (Clustering Algorithm). Clustering will reduce a list of Run Instances (the global

history) to a single run instance with the intention that the result will include only the recurring

Events from the input.

cluster : RunInstance∗ RunInstance →

The EIA’s choice of clustering algorithm is Sequential Leader Clustering [Har75]. It is extremely simple, but effective. The main strength of this algorithm is that it does not need to know the number of clusters in advance. The actual number of clusters can be as low as 0, where no tasks recur, or as high as the entire run instance. Whatever learning method that is used to find recur- rence is also responsible for determining this number. With this requirement, many other popular clustering algorithms, such as k-means [Har75], would be unsuitable.

An obvious downside to using Sequential Leader Clustering is that the order in which data points are read has a effect on the outcome that is both significant and predictable. If the order is chosen by a malicious attacker, single clusters could become 2 smaller clusters (or vice versa), which would be detrimental to the system’s efficiency. Luckily, this is not a concern for the Advisor, as the only source for this data comes from its own transform step.

The problem-specific aspects of clustering are hidden in a single function.

Definition 2.3.2 (Task Similarity). Task similarity is a function sim : Task Task R, which × →

13 returns 0 for identical tasks and a positive number otherwise.

The convention used by sim is that more similar pairs will return a lower value, bounded below by

0, which should indicate that tasks are identical. It is also expected that small changes to tasks will result in relatively small changes to similarity.

Throughout the explanation of this algorithm, the following tunable parameters will be used (rec- ommended values adopted from [SD+10]):

Parameter Name Description Recommended Value

minsim Minimum similarity where tasks can clustered together. 10

minocc Minimum ratio of occurrences to run instances. 0.7

pastwind Number of run instances from history to be included. 5 To begin with, we retrieve the most recent run instances from the collection of global histories

collected so far. We use a sliding window (of length pastwind) on the most recent run instances

in the agent histories to limit those under consideration. Those tasks which were recurring early

in the run may not be recurring later. The sliding windows gives the ability to forget and focus on

what is recent.

 (ev1,1 ...),(ev2,1 ...)...(evpastwind,1 ...)      = (ta1,1,t1,1)... , (ta2,1,t2,1)...... (tapastwind,1,tpastwind,1)... is flattened into a list of tasks:

 (ta1,1,...ta2,1,...tapastwind,1 ...

There is relevant time-data within the task itself, so the flattening does not cause useful information

to be lost. If we were interested in tasks which repeat only on odd or even run instances, the

flattening would destroy crucial information, and a different approach would be used. Tasks which

repeat multiple times per run instance can still be learned, post-flattening: each instance will simply

be treated as separate clusters, each repeating once per run instance.

14 This flattened set of tasks will be iterated over, applying the sim measure to each, against the tasks

that have already been examined. In the process, a partitioning of the set will be formed, adding

one task at a time. The goal is to place sufficiently similar tasks together, while ensuring that

sufficiently different tasks are separated.

Each cluster, c, in the result and intermediate stages will have a single task within it, calculated to

be the closest to the center of the cluster. This task will be called the centroid and will be referred

to as centroid(c).

For each task, ta:

Case 1 sim(ta,centroid(c)) < minsim - For some existing cluster, the new candidate task,

ta, is within the threshold distance of the center. As a consequence, the new task

will be added to this cluster, and the cluster’s centroid will be adjusted as necessary.

Case 2 sim(ta,centroid(c) minsim - For all existing clusters, the task is outside the ≥ threshold distance, so is not added. Instead, a new cluster, with this task as cen-

troid is created.

Cluster sizes will then be compared to the number of run instances used in the clustering. If k is the number of run instances and size(c) is the number of tasks in the cluster c, the task at centroid(c) is considered recurring if it occurs more frequently than minocc.

The output run instance can then be defined as:

  rirec = centroid(c),0 c clusters, size(c) > minocc pastwind ∈ ·

2.3.2 The optimize step

The more interesting problems under the Dynamic Task Fulfillment heading are computationally

difficult ones. They are often variants of the Travelling Salesman Problem. For larger run instances,

it is not possible to find the exact optimal solutions, and for medium-sized run instances, exact

15 optimal solutions can only be found with a significant investment of time.

Definition 2.3.3 (Static Optimizer). Provided agent detail and a Solution Quality measure, the

static optimizer is a map:

optimize : RunInstance Solution →

With this in mind, the goal of the optimize step is not to find the exact optimal solutions. Rather,

we are going to look for something of acceptable quality. There has been a significant amount of

research into approaches finding estimated optimal solutions. And luckily, effort is made to keep

these approaches as general as possible, hiding any application-specific details within the solution

quality measure.

Popular approaches in the transportation domain include:

Genetic Algorithms (GA) [HD+10] • Mixed Integer Programming (MIP) solved by simplex method [MS+10] • Branch and Bound (also sometimes Integer Programming) [DDS91] • The branch and bound is typically considered infeasible as it does return exact solutions (unless modified to return an estimate). It is used by the EIA, although it will not be the Integer Program- ming variety, and not applied to large problems. This is simply done to demonstrate the generality of the EIA in choice of optimizer.

The GA and Branch-and-bound methods are both used in experimental evaluations, and for each it is taken for granted that an application-specific Solution Quality measure has been provided.

qual : Solution R →

This will be a layer of protection for the generality of the static optimizers. All problem-specific details will be hidden behind the qual function.

For more detail on the optimizers used, see Appendix A.

16 2.3.3 The derive step

While the derive step is nowhere near as computationally intensive as the optimize step, it is easily

as complicated in definition. derive is where the result of the previous steps is converted from a

sequence of assignments into a set of Exception Rules for each agent.

Exception Rules

Time-triggered rules Task-triggered rules Neighborhood-triggered rules

Forecast rules Detection rules Ignore rules Boost rules Wait rules Idle rules Path rules

Figure 2.4: Types of Exception Rules

There are many types of rules available, see Figure 2.4 originally from [KB+10]. The EIA does

not exploit all of these possibilities; rather, two types of rules were found [SD+11] to be sufficient.

We will begin with ignore rules, first introduced in [SD+10].

Definition 2.3.4 (Ignore Rule). An Ignore Rule has the form condig(s,d) ata. condig is a set → ¬ of conditions on situations, s, and internal agent states, d, where if satisfied, the agent is prevented

from executing actions associated with fulfilling task ta. Precisely how ata is interpreted by the ¬ agent is problem specific.

These ignore rules do not directly give the agent an instruction. Instead they tell the agent what not

to do. Apart from the task to be ignored, the agent still retains most of its autonomy. This reduces

the possibility for harm in the case that the rule is poorly chosen. Although the rules are beneficial

in the context of known tasks, this potential for harm can be re-introduced when dynamic tasks are

involved.

Next, we have proactive rules, introduced in [SD+11].

Definition 2.3.5 (Proactive Rule). A Proactive Rule, condproa(s,d) prep(ta), indicates that → when condition, condproa, on an agent’s situation, s, and internal state, d, if satisfied, the agent should execute an action sequence prep(ta). Like ata, the precise interpretation of prep(ta) will ¬

17 be problem specific.

These proactive rules, when in effect, completely override all normal behavior of the agent. Typi-

cally this means forcing the agent to go to a certain place at a certain time, which, without complete

global knowledge, can be worse for efficiency than letting the agent decide for itself. Although

these cases of harm are usually rare, the risk is still higher than it is for ignore rules. For this

reason, a stronger preference is given to ignore rules over proactive rules when trying to correct an

agent’s behavior.

Given the optimized result from the previous step (assuming a knowledge set of size p):

optrec = ((ta1, g1,t1),...,(ta1 , g1 ,t1)) 1 A 1 1 p A p p

and the previous emergent result:

last = ((ta2, g2,t2),...,(ta2 , g2 ,t2)), 1 A 1 1 p A p p

the sequences are compared, element by element. The least j [1, p] is selected such that g1 = ∈ A j 6 g2 or ta1 = ta2. If no such j exists, it is unnecessary to generate new rules. This index represents A j j 6 j the earliest point in the run instance where the emergent solution deviates from the plan.

To correct this deviation, there are 2 choices: use an ignore rule or use a proactive rule. Ignore

2 rules are attempted first, and this will simply be a rule of the form: condig(s,d) a 2 , since ta j → ¬ ta j was the task that lead the agent astray.

If the ignore rule is seen to be ineffective or harmful, it can be removed and replaced with a

1 1 proactive rule. This will be a rule of the form: condproa(s,d) prep(ta ), since ta was the task → j j we want to force the agent to fulfill.

The definitions of condig and condproa, which decide when the rule is activated and when it expires are slightly dependent on the problem domain. More detail about these can be found in [SD+10] and [SD+11].

18 Chapter 3

The Extended Efficiency Improvement Advisor

In this chapter we introduce the main contribution of this thesis, the Extended Efficiency Improve- ment Advisor (EEIA). The EEIA represents an extension of the EIA to deal with additional types of knowledge in pursuit of new goals.

Recurring knowledge used by the EIA is only one possible source of knowledge. In practical applications (instantiations of DTF) there will be tasks known in advance, not because they were recurring and learned, but because they were promised in some way. In many industries, a customer can pay in advance, or make appointments and reservations. Failing to utilize this knowledge in planning comes with an efficiency loss. The main reason for the EEIA’s creation is to exploit this new knowledge and avoid this efficiency loss as much as possible.

As a secondary motivation for this change, we focus on the EEIA’s effect when foreknowledge is complete (or near complete). The EEIA ought to be able to make the dynamic system it advises follow a plan, without sacrificing the fundamental dynamism of the system. In other words, in-

fluence must be strong enough that the pre-known tasks are fulfilled in the way prescribed by the

EEIA’s plan, but that same influence must not be so strong that dynamically appearing tasks can not be handled as they would be normally (without advice).

In addition to these new goals, the previous strengths of the system should be upheld ( and are, see

Section 2.3). The EEIA continues to be a non-essential part of the system. When the EEIA fails, the surviving agents revert to their original behavior and continue to fulfill tasks. No additional communication is required for these new goals either. The EEIA’s only knowledge of the environ- ment comes from agent testimony, not direct observation, and the only influence comes in the form of exception rules, distributed prior to the start of a run instance.

19 In the following we first present the general working cycle of the EEIA in preparation for each run

instance, then we focus on the key step in which it differs from the EIA.

3.1 The Working Cycle of the EEIA

Parallel to the EIA’s 6 actions (see Figure 2.3), the EEIA has 7 actions, shown in Figure 3.1. While

describing these actions, let there be a (possibly empty) set taknow,...,taknow representing the { 1 q } known-ahead tasks that are given to the advisor before the next run instance starts. The sequence of steps performed by the EEIA is as follows:

1. receive collects the local histories Hi as before.

2. transform creates the global history GHist out of the agent histories as before.

rec rec 3. extract creates the list of recurring tasks ta1 ,...,tap as before.

rec rec 4. merge combines the recurring tasks ta1 , ..., tap with the known-ahead tasks know know ta1 , ..., taq to create the complete knowledge set for the next run instance.

5. optimize computes the optimal solution

0 0 0 0 optr+k = ((tar+k, g r+k,t r+k),...,(tar+k , g r+k,t r+k)), 1 A 1 1 p+q A p+q p+q 0 0 g r+k A, t r+k Time for the complete knowledge set of tasks. A j ∈ j ∈

r+k 6. derive creates exception rule sets Ri for all agents in A reflecting opt .

7. send communicates to all agents their exception rule sets to replace their current

exception rule sets.

The steps receive and transform are left unaltered from the EIA. This provides slightly more in- formation than necessary, as the EEIA only requires a list of tasks observed. extract is unchanged, and the full description can be found in Section 2.3.1. No changes were made for the optimize or send steps, and they are performed exactly as described before.

20 Advisor

Extract Merge Optimize recurring static solution of Transform tasks from knowledge local agent global combined Derive histories history knowledge rules from into global optimal history solution

Data model Receive (advisor states, agent knowledge, Send local agent environment knowledge, rule sets, derived rules histories intermediate results, ...) to agents

histories rules

Basic MAS

Figure 3.1: The actions of the EEIA.

The merge step is a new addition to the procedure, but not a complicated one. The set of known- ahead tasks taknow,...,taknow is added to the pool of knowledge, already containing the recurring { 1 q } tasks from the extract step. Duplicate tasks are eliminated at this step, as a known-ahead task may

be repeated often enough that it is, at this point, recognized as recurring. It would be unfortunate to

assign 2 agents to the same task in this case, not realizing the recurring task and the known-ahead

task were the same.

A complete overhaul has been performed on the derive step. Due to the introduction of known-

ahead tasks, the optimal plan will change drastically from run instance to run instance. It is no

longer possible to build up a rule set incrementally over time. Instead, the rule sets for every agent

will be purged and rebuilt whole before each run instance.

Another consequence of this new approach is that “bad behavior” is not necessarily observed from

previous run instances anymore. Correcting deviations from an optimal solution is not as straight-

forward when the optimal solution is different every time. The idea behind the EEIA’s new derive

step is to create a rule set that is a direct expression of a plan, rather than making corrections to

21 past performance.

This rule set is strict enough to force agents to follow any plan (to prevent them from perform-

ing assigned tasks out of order), but flexible enough to allow for the original dynamic behavior.

The dynamic behavior is simply delayed until the next interval of free time between assignments.

Precise description of this rule set’s creation is left to the next section.

3.2 The derive step

The EEIA’s derive action follows the optimize action. This means that, from a body of knowledge

containing both learned, recurring knowledge, and known-ahead static knowledge, the EEIA is in

possession of an a priori optimized, optr+k Solution. This will be a list of assignments, assumed ∈ to be sorted (first by agent, then by time):

r+k opt = [(ta1, g0,t1), A

(ta2, g0,t2), A

(ta3, g0,t3), A . .

(tak 1, g1,tk 1), + A +

(tak 2, g1,tk 2), + A +

(tak 3, g1,tk 3), + A + . .

]

r+k start end Given any assignment (tai, gi,ti) opt with tai = (detaili,t t ), the EEIA creates the A ∈ i i following exception rules for various agents:

for gi it creates • A 22 – a proactive rule condproa(s,a) prep(tai), where condproa is satis- → fied when:

Current time is in the interval: ∗

(ti preptime(tai) clustervar(tai), ti +timeout) − −

gi’s log does not indicate prep(tai) has been per- ∗ A formed after time ti preptime(tai) clustervar(tai) − − (this mitigates damage from misleading knowledge).

there is no prior assignment,(ta j, gi,t j) for gi, or ∗ A A ta j is already fulfilled (this implements chaining).

– an ignore rule condig(s,d) ata , condig(s,d) satisfied when cur- → ¬ i rent time is in the interval:

start  ti , ti

for every gq, with q = i, it creates: • A 6

– an ignore rule condig ata , condig(s,d) satisfied when current → ¬ i time is in the interval:

start  ti , ti +timeout

The conditions condig and condproa can be communicated entirely in the DTF abstracted language of tasks and timestamps. Generality at the DTF level can not be upheld unless this is strictly enforced. This means that the following must be defined on the problem-specific level. Each of these problem specific-attributes is defined in the problem specific sections. See Section 4.4.2

(Whack-a-mole) and Section 5.4.2 (Pickup and Delivery):

ata - as mentioned earlier. • ¬ 23 prep(ta) - as mentioned earlier. • timeout - an integer, is typically 10. Lowering this number means agents will be • quicker to abandon a plan (in the case of an agent being late, or failing entirely)

preptime(ta) - the time required to perform prep(ta). As, prep(ta) can vary based • on the target task, so can preptime(ta).

clustervar(ta) - depends on configuration of the clustering algorithm. Should esti- • mate the potential time variation within a single cluster.

Time intervals in these conditions are written as excluding the endpoints, but the agent receiv- ing the rule also has the option of interpreting them as inclusive if needed for problem-specific reasons.

The chaining mentioned refers to a condition of proactive rules which allows them to overlap in time intervals without conflicting. Rather than relying on the rule conflict manager to resolve this, the later rule is explicitly deactivated, pending completion of the earlier rule. This was introduced in [SD+11].

In summary, each agent is given a proactive rule for each of their own assignments. This has the benefit of making the agent “move” preemptively (perform prep), even before the task is an- nounced. Obviously, starting sooner and finishing sooner will have a performance benefit by most measures.

In practice, the proactive rules are often already enough to give the performance gains of the entire rule set. They are not to make the agents follow a plan, though. An agent with free time between assigned tasks (between proactive rules) will often take this opportunity to steal a task which is assigned to another agent. While the performance loss from this is usually not gigantic, it is a deviation from the plan. Assuming the plan is optimal, this is almost certainly some performance loss. For this reason, agents must have an ignore rule for each other’s assignments.

24 In addition, an agent sitting idle while waiting for an assigned task to appear may be compelled

to fulfill a different task assigned to it. This is in effect self-stealing, and performing assignments out of order is yet another way to deviate from the plan. Since this second set of ignore rules are added for the idle time, they have no effect in practice on tasks planned for the beginning of their time windows (those tasks that should begin immediately). So, an agent must not only have ignore rules for the assignments of other agents, it must also have ignore rules for its own assignments.

Admittedly, the performance effect of this is very small. This is done in pursuit of forcing the agents to adhere to a plan.

In total, this procedure creates up to m A ignore rules and up to m proactive rules, where m is · | | the run instance size and A is the set of agents. It may seem that (m A + m) is a large number of · | | rules and possibly enough to diminish autonomy. Showing this is not the case, and showing that the base system can still function dynamically under this advice is done experimentally. To do this we need applications.

3.3 Simple Example

Before proceeding to a full application we will begin with a very high-level and brief example. To do this, we need an instantiation of Dynamic Task Fulfillment. In order to make practical sense, we probably need more than a time window on a task, but not much more. Tasks for this example are a duration and a time window.

Definition 3.3.1 (Simple Example Task). Tasks are a 3-tuple

EX Task = Z Time Time × × where,

start end tai = (durationi,ti ,ti )

The duration in this task serves to add risk to the act of beginning a task (they must be finished, if

25 started). For task similarity, we use Euclidean distance (on Z3, since Time Z), and for solution ⊆ quality, we use “tasks unfulfilled” (to be minimized).

Consider a single deterministic agent, gEX , facing this dynamic problem. It has complete knowl- A edge of each task upon announcement of that task, no knowledge of others. Once it starts a task, it

must complete it, before beginning a new task. We define the agent so that:

If a single task is available (and can be completed in remaining time), it starts it. • If multiple appropriate tasks are available, it attempts the soonest to expire. • EX When g has an active ignore rule, ata , for ta j, it acts as though this task was never an- A ¬ j nounced.

EX When g has an active proactive rule, prep(ta j), for ta j, it acts as though this task was the only A task announced, whether it was announced or not.

Now, consider a run of length 1 (run = [ri1]), where the only run instance is defined as:   1, (1, 1, 4),         3, (5, 3, 10) ,  ri1 =       4, (1, 4, 6) ,      5, (2, 5, 9)

avail start end Where each event is written t , duration,t ,t . For short we may say ri1 = [ev1,ev2,ev3,ev4],

avail  where each evi = ti ,tai . This run instance is illustrated as a timeline in Figure 3.2 (darkened, inner rectangles indicate the execution in the final table of this section).

Obviously, in a run of length 1, there is no opportunity for learning, and the EIA would not be able to create rules. So, the unadvised emergent solution would be the same as the solution under the

EIA’s guidance, and would look like this (as a sequence of agent actions):

26 ta1

ta2

ta3

ta4

1 2 3 4 5 6 7 8 9 10 11 Time →

Figure 3.2: High-level example run instance.

Time Step Situation Action

0 {} 1 ta1 complete ta1 { } 2 {} 3 ta2 start ta2 { } 4 ta2,ta3 continue ta2 { } 5 ta2,ta3,ta4 continue ta2 { } 6 ta2,ta4 continue ta2 { } 7 ta2,ta4 complete ta2 { } 8 ta4 { } 9 {} 10 {} This gives a solution quality of 2 unfulfilled tasks. In hindsight, it is clear what the mistake was.

Under these circumstances the EEIA would also not be able to learn anything, however, it can exploit other knowledge sources. For the sake of this example, assume the knowledge set known =

27 ta3 is being included in the EEIA’s merge step. The plan generated during the optimize step { } could only create one assignment. According to the derive step it generates 2 rules, one ignore and one proactive, both for our only agent, and both for ta3. Assume preptime returns a constant 2, timeout = 5, and proactive rules override ignore rules.

Time Step Situation After Rules Action

0 {} {} 1 ta1 complete ta1 { } {} 2 ta3 {} { } 3 ta2 ta3 { }{ } 4 ta2,ta3 ta3 complete ta3 { }{ } 5 ta2,ta4 ta2,ta4 start ta4 { }{ } 6 ta2,ta4 ta2,ta4 complete ta4 { }{ } 7 ta2 ta2 { }{ } 8 ta2 ta2 { }{ } 9 ta2 ta2 { }{ } 10 {} {} And, this solution has an improved quality of only 1 unfulfilled task (which is, in fact, optimal).

Since, ta3 was assigned to the agent to begin exactly at the opening of its time window, the ignore rule was never in effect. The only rule which did anything was the proactive rule, which was enough to avoid the costly ta2.

Although this was a clearly contrived run instance, it turns out that opportunities like this to apply partial knowledge are very common, and such positive results are shown to be reliable in many circumstances in Chapter 6.

28 Chapter 4

Simple Application: Whack-a-mole

In this chapter we will look at a familiar and concrete example of Dynamic Task Fulfillment. The

example in Section 3.3 created dilemmas and risk in choices by using a duration attribute. A more

interesting way to create the same properties is to fix the problem in a real world game.

Whack-a-mole is in many ways the ideal first application. It is known and understood worldwide.

It also serves as a universal metaphor for frantically addressing a sequence of problems that appear

without warning. While literally improving a score on a arcade game is not a great motivation for

research, optimization and planning, it should be clear that these techniques can be applied equally

well to things that resemble Whack-a-mole:

Pulling weeds in a garden • Spraying houses for bed bugs • Putting out wild fires • Answering phones in a call center • If examined, it is likely that anything to which Whack-a-mole can be applied as a metaphor, is an instantiation of Dynamic Task Fulfillment, and an area where the EEIA can help.

Originally, Whack-a-mole was an arcade game, and a mechanical one [Wit08], rather than a video game. It consisted of a table-top with several holes cut in it. The players were armed with cush- ioned mallets, which were tethered to the table.

In a pseudo-random way, moles would rise up from the holes in the table. Hitting the risen mole with a hammer is the goal of the players, and would result in points for the player. If the mole was

29 not hit by any player within a short time, it would recede back into the hole, granting the players no points.

It may seem unrealistic that there would be any repetition, or fore-knowledge, or potential for learning, in a game like this, and it is probably true for modern versions. Consider the difficulty of implementing a random number generator in a mechanical system, or a simple electronic system limited to reading a pseudo-random pattern from a magnetic tape. The older, original constructions had this possibility (or we can pretend they did). Besides, problems related to Whack-a-mole by metaphor almost always have predictable events.

These arcade cabinets typically had fewer holes than the environments we use in experiments.

Many games had as few as 5 holes arranged in a line, and rarely got larger than 3 3 in size. Human × players were competing on reaction time, hand-eye coordination, and speed (arm strength), which may make the game fun for humans, but is not so interesting from the “partial future knowledge” perspective.

Simulators, and mathematical abstractions, tend to erase these aspects of the game by setting a constant speed limit [GK+06] (1 edge per time step), and making all reaction time mostly instant

(as a computer does).

Consider this mole schedule as an example, where the hammer starts at the location 0,1, and can move one edge per time step, and loses a time step by hitting (pictured in Figure 4.1):

Location Start End

0,0 1 16

4,0 0 12

4,4 4 8 At time 0, there is only one mole visible, and little knowledge available for the player to make decisions. However, this changes shortly.

There may be a temptation to hit the moles in the same order they appear (which is commonly

30 l0,0 l1,0 l2,0 l3,0 l4,0

l0,1 l1,1 l2,1 l3,1 l4,1

l0,2 l1,2 l2,2 l3,2 l4,2

l0,3 l1,3 l2,3 l3,3 l4,3

l0,4 l1,4 l2,4 l3,4 l4,4

Figure 4.1: Example mole schedule, and comparison of hammer choices. called the First-In-First-Out (FIFO) strategy), which would result in the following sequence, hitting

2 out of 3 moles:

Location Start End Hit At

0,0 1 16 10

4,0 0 12 5

4,4 4 8 miss The player may choose to pursue whichever mole is closest to the hammer (which we call the

Nearest-Neighbor (NN) strategy), which would result in the following sequence, also hitting 2 out

of 3 moles:

31 Location Start End Hit At

0,0 1 16 3

4,0 0 12 8

4,4 4 8 miss In hindsight it is obvious that all 3 moles can be hit (an optimal solution) if the earliest mole to expire is pursued first, generating this sequence:

Location Start End Hit At

0,0 1 16 15

4,0 0 12 10

4,4 4 8 5 However, this is not possible unless the player has foreknowledge of the mole in the lower right.

The good news is that one of the other solutions with the added knowledge of this single mole would be enough to achieve an optimal solution. Knowing all 3 moles in advance would not be necessary.

The game of Whack-a-mole has a simple set of game rules, making it a good illustration for both DTF and the advisor. It replaces the duration attribute from Section 3.3 with travel time.

Complexity is added by this, since travel time is variable (depends on starting location). It also presents the opportunity to change course, abandoning one task for another, although this is often at increased travel cost. The above example shows that every choice considered a misstep is also one which causes travel cost to be inflated above available time. While these side effects contribute to an interesting and meaningful solution quality measure, they are also hidden completely in the solution quality. The advisor itself does not change to accommodate them.

In the following sections, partial knowledge is applied to both the FIFO and NN solutions, showing that this effect is common and not only restricted to examples as contrived as this one.

32 4.1 The Environment

The environment will be an undirected graph generated from a N M grid. Edges will be included × from vertices adjacent vertically, horizontally and diagonally. Varying sizes are used to show how performance relates to problem size (illustrated in Figure 4.2).

l0,0 l1,0 l2,0 l3,0 l4,0

l0,0 l1,0 l2,0 l0,1 l1,1 l2,1 l3,1 l4,1

l0,1 l1,1 l2,1 l0,2 l1,2 l2,2 l3,2 l4,2

l0,2 l1,2 l2,2 l0,3 l1,3 l2,3 l3,3 l4,3

l0,4 l1,4 l2,4 l3,4 l4,4

3 3 Environment 5 5 Environment × ×

Figure 4.2: The Whack-a-mole environments, varying sizes (are not required to be square).

This graph will be called gNM, and formally,

gNM = (Loc,E), where E Loc Loc ⊆ ×

Here, Loc is the set of all locations.

The following are a list of conventions that have been chosen. The majority are arbitrary, and changing them should have little effect on results (as long as a corresponding change is made in the solution quality measure).

All edges have weight 1 (including diagonals). • Time and movement are discrete. • The speed is limited to 1 move per time step. •

33 The hammer cannot move and whack a mole in the same time step. • The hammer may move from one vertex to another directly connected vertex. • A mole may share a vertex with a hammer. • Two hammers cannot share a vertex. • If a mole hides, it is not visible. • If a mole has been whacked, it is not visible. • When a hammer whacks a mole, any visible mole sharing a location with the ham- • mer is considered whacked.

A hammer attempting to whack a mole at a location where no moles are visible has • no effect (except to waste time).

Each time step, the moles act first, then the hammers. • The uniform edge weights conveniently allow all path finding to be done by breadth first search.

This and the majority of the other choices were made arbitrarily. The EIA and EEIA do not need to change to operate in different environments (or conventions different from those above). Any adjustments taking place will be hidden in the detail of the Solution Quality and Task Similarity measures, which are provided as parameters.

34 4.2 Dynamic Task Fulfillment Perspective

For anyone familiar with the game of Whack-a-mole, there is no doubt about the type of event that

can occur, and the type of work expected in response. There will be exactly one task per mole,

with the Event representing the mole emerging from its hole, and the Task of whacking it with a hammer before it disappears again.

Definition 4.2.1 (Mole Task). A Mole Task is a Task as in Def. 2.1.3,

start end tai = (detaili,ti ,ti ),

start end with t ,t Time, and the detaili = li is simply the mole’s location. i i ∈ A mole in Whack-a-mole is really no more complicated than its time window and location. Both for planning purposes, and dictating the mole’s behavior, this 3-tuple gives a complete representa- tion.

4.2.1 Task Similarity

The end of each mole’s time window is rarely known (even in hindsight). Whacking the mole erases this information. Although neglecting the mole would reveal the whole time window, this

start is detrimental to the score. Due to this, the ti and li are considered definitive of the Mole Task, end while the ti is given less weight in some decisions. In fact, for the purposes of the task similarity measure, it is ignored.

Definition 4.2.2 (Mole Task Similarity). Given a pair of Mole Tasks,

mole mole mole sim : Task Task R, × → is defined as:

mole  start end start end  start start sim (li,t ,t ),(l j,t ,t ) = dist(li,l j) + t t i i j j i − j

where dist(li,l j) is the shortest path length found by breadth first search.

35 4.2.2 Optimization Criterion

The choice of solution quality is the obvious one, the number of moles hit (to be maximized), or

the number of moles missed (to be minimized). While this is a simple count that takes place during

the run of a simulator, we also require a method to calculate this directly from a list of assignments,

which is much faster than simulating.

mole Definition 4.2.3 (Mole Solution Quality). The quality, qual : Solution R, is the number of → moles whacked (to be maximized).

Finding the number of moles whacked versus not whacked is easily returned from the logs of a

MAS simulation. However, we also require the capability of determining the quality of an arbitrary

solution, which is a list of assignments.

It is assumed that any information about the world required for dist is available, and also the starting locations, lstart, are known and constant across all solutions to be evaluated. Ag j

To describe this process, first assume the assignments are in a list, sorted by assigned agent, then by assigned time.

[(ta1, g0,t1), A

(ta2, g0,t2), A

(ta3, g0,t3), A . .

(tak 1, g1,tk 1), + A +

(tak 2, g1,tk 2), + A +

(tak 3, g1,tk 3), + A + . .

]

Every assignment, (tai, gi,ti), will be checked against the following criteria: A 36 start end Where tai = (li,t ,t ), a hit requires: • i i

start end t ti < t i ≤ i

Where i > 1 and tai and tai−1 are assigned the same agent, a hit (on tai) requires: •

ti ti−1 > dist(li,li−1) + 1 −

The +1 represents the time step required for the whack action.

Where i = 1 or tai and tai−1 are not assigned the same agent, a hit (on tai) requires: •

t dist l lstart i > ( i, Ag j ) + 1

Where g j is the agent assigned to tai (+1 for the whack action). A In this way each assignment is marked as either a hit, or miss. Ideally a solution or plan would not include multiple assignments for the same task, but if it does, the following instructions settle any ambiguity:

At least one hit as determined by the conditions above means the mole is hit. •

Multiple hits on the same tai count as a single hit. •

A tai which is not included by any assignment is considered to be a miss. • This gives a definition for our quality function:

mole qual : Solution R → qualmole(sol) = ∑ miss(as) as∈sol

Note that the actual game world does enforce the condition that no 2 hammers can share a loca- tion. This can lead to hammers being delayed when their paths cross (or worse, block each other completely). The simulator will take this into account, but the geometric assessment provided by qualmole will not. This will be a recurring theme.

37 4.2.3 Bounding Function

The branch and bound algorithm (see Section 2.3.2 and Appendix A) requires a bounding function

that corresponds to the existing optimization criterion. This bounding function is described in this

section rather than in the appendix for 2 reasons. First, it is (slightly) problem specific, and second,

it is closely related to the qualmole function.

Bounding functions are not required for a problem to qualify as a Dynamic Task Fulfillment prob- lem, but for Whack-a-mole, and specifically the qualmole measure we have defined, a bounding

function does exist. This is easy to produce from qualmole, since the qualmole procedure is already

compatible with partial solutions (a list of tasks, acting as prefix for a full solution, as in Def.

2.1.5), we can use it in the definition of the bounding function.

If we define another function:

unassigned : RunInstance Solution (Task) × → P  0 0 0 unassigned(ri,sol) = ev (ta ,t ) ri, x,y (ta ,x,y) / sol ∈ ∀ ∈

This will reveal how many events are missing assignments in the solution. A bound is then given

as follows (assuming the ri is available):

mole bound : Solution R → boundmole(sol) = qualmole(sol) unassigned(ri,sol) − | |

This bounding function will take the typical assessment used for complete solutions and reverse the convention on unassigned tasks. All unassigned tasks will be considered hits rather than misses.

As long as the bounding function is used under only the condition that multiple assignments will not be given to the same task, no complete solution built up from the provided partial solution can exceed this bound.

38 4.3 Agents

Moles will be treated as agents in this implementation. They are not actively conspiring to help or

hurt the score; they are simply following a schedule.

Definition 4.3.1 (Mole Agent). A Mole Agent is an agent

gmole = (Sitmole,Actmole,Datmole, f mole), A Ag where:

Sitmole = Loc Time Loc∗ Loc∗ - The mole is informed of its current location, • × × × the current time, the list of all visible moles, and the list of all hammers, each

iteration

Actmole = show,hide - The mole will only appear or disappear • { } Datmole = Time Time - The mole only needs to know its schedule, (tstart,tstop) • × f mole : Sit Dat Act can be defined as follows: • Ag × →   show : tstart tcurrent < tstop mole start stop   fAg sit,(t ,t ) = ≤  hide : else

where sit = (lcurrent,tcurrent,moles,hammers)

Definition 4.3.2 is the abstract hammer. Different solutions (here, FIFO and NN) fit within the

definition. It simply fixes the definitions of Act and Sit, to give a consistent interface with the game

world.

Definition 4.3.2 (Hammer Agent). A Hammer Agent is an agent

ghamm = (Sithamm,Acthamm,Dathamm, f hamm), A Ag where:

39 Sithamm = Loc Time Loc∗ Loc∗ - The hammer is informed of its current loca- • × × × tion, the current time, the list of all visible moles, and the list of all hammers, each

iteration

hamm Act = movel,whack,wait - The hammer may move in steps across the envi- • { } ronment, then whack any moles it encounters. These moves are parameterized by a

location l.

Dathamm - This is specific to the solution • f hamm : Sit Dat Act - This is specific to the solution • Ag × → FIFO (first-in-first-out) is an extremely simple and reasonably effective solution to Whack-a-mole.

This hammer tries to whack moles in the same order they appear. If the number of simultaneously visible moles is low, it is close to the imperfect solution a human player might make. When the number of simultaneous moles is high, though, the ability to remember order is not very human

(or technically helpful). Troubleshooting and testing are simple because there is no randomness involved, nor any calculation that would be difficult on pen and paper. When a tie-breaker is necessary (when two or more moles appear at once), left-most in the situation list will be chosen.

The FIFO hammer’s actions are never a mystery and it is easy to contrive solutions both where it does well and does poorly.

Definition 4.3.3 (FIFO Agent). A FIFO (first-in-first-out) Agent is a hammer agent

gfifo = (Sitfifo,Actfifo,Datfifo, f fifo), A Ag where:

Sitfifo = Sithamm • Actfifo = Acthamm • fifo ∗ Dat = gNM Loc - Storage for a map of the environment and a queue of mole • { }× locations

40 The specific function is as follows: •

f fifo : Sitfifo Datfifo Actfifo Ag × →   wait : queue  = []    fifo current fAg sit,dat = whack : l queue  ∈   movelnext : else

where queue = intersect(queueold,moles) + moles

and lnext = head(path(lcurrent,head(queue)))

and sit = (lcurrent,tcurrent,moles,hammers)

old and dat = (gNM,queue )

Here intersect is the list intersection preserving the order of the left list, and + indicates concatena-

old tion. The new queue calculated will be stored in Dat to be the next queue . path(lx,ly) indicates a list of locations describing the shortest path from lx to ly (excluding lx), found by breadth-first search.

A different dynamic solution to Whack-a-mole is NN (nearest-neighbor), and it produces solutions which are slightly better than FIFO in the average case. Each hammer in this solution simply moves closer to whichever visible mole is closest. When a tie-breaker is necessary (when two or more moles are equally close), left-most in the situation list will be chosen. Again, this produces solutions which are very close to the behavior of a human player, and the choices it makes are always fully obvious in advance.

Definition 4.3.4 (NN Agent). A NN (nearest-neighbor) Agent is a hammer agent

gnn = (Sitnn,Actnn,Datnn, f nn), A Ag where:

Sitnn = Sithamm •

41 Actnn = Acthamm • nn Dat = gNM - Storage for a map of the environment • The specific function is as follows: •

f nn : Sitnn Datnn Actnn Ag × →   wait : moles  = []    nn current current current fAg (l ,t ,moles,hammers),gNM = whack : l moles  ∈   movelnext : else

where lnext = pathlcurrent,mindist(lcurrent,moles)

These 2 strategies (FIFO and NN) will be analyzed side by side to show the flexibility of the

EEIA’s implementation and the consistency of its results. FIFO is used in single hammer exper- iments, while NN is used in multi-hammer experiments. FIFO is the weaker strategy in general, but becomes incredibly poor when multiple agents use it together.

The first mole to appear for one agent is always the first mole for every other agent too. Multiple

FIFO agents will always choose the same target, effectively reducing their productivity to that of a single agent. NN is stronger since the distance measurements are different from the perspective of different locations, at least giving the potential for agents to split up the work between them. The fact that the environment forces the hammers to have different locations means their distance cal- culations will be different from the beginning of the run instance, further increasing the favorability of NN over FIFO.

This is important because it shows the extent of the coordination between agents. Strategies like the aforementioned Contract Net Protocol [Sm80] would allow agents to claim moles prior to investing time in them. More importantly, they would avoid wasting effort on moles claimed by other agents. FIFO and NN do not have abilities this effective. In fact, the coordination strategy for NN and FIFO is to not coordinate.

42 Despite not having this ability ingrained, it is partially created through the influence of the Advisor.

Ignore rules discussed in the next section can create the effects of coordination where it does not exist.

4.4 Required Modifications of Agents for Working with An Advisor

An advisor requires the agents to provide their histories after each run instance, and also to read and follow rules. Neither of these behaviors were included in the basic definitions of the hammer agents, so they will have to be enhanced. Since, for Whack-a-mole, we are considering 2 dynamic solutions to be improved, both must be adapted. They will be described in parallel to show the similarity.

4.4.1 Providing Histories

Both the gnn (the nearest-neighbor hammer) and gfifo (the FIFO hammer) will be exposed to the A A same situations, in the same world, so it makes sense that they would interpret and report histories in exactly the same way. That is what has been done, so the description here applies to both of them.

The Dat for both gnn and gfifo will be modified to include a log of events as perceived by the A A agents.

A log entry will have the form,

Entry = Loc Time appear,disappear × × { }

And the log will have the form,

Log = Entry∗

At each time step, an agent has the opportunity to add 0 or more entries to its own log. These

43 entries are chosen by comparing the situation at the current time step (current Time): ∈

sitcurrent = (lcurrent,current,molescurrent,hammerscurrent) to the situation in the previous time step (current = prev + 1):

sit prev = (l prev,current 1,molesprev,hammersprev) −

At every time step (except the first), appearances are calculated as follows:

current  current prev a = (l,current,appear) l moles , l / moles ∈ ∈

Disappearances are calculated as follows:

current  prev current d = (l,current,disappear) l moles . l / moles ∈ ∈

Then the log is updated:

logcurrent = logprev acurrent dcurrent ∪ ∪

At the end of the run instance, the final log, log final, is analyzed, creating a list of Mole Tasks. We use a parameter minwin = 8 to ensure any unusually short time windows are extended to some minimum, which will be more useful in planning (these were obviously whacked early, annihilating the true information). Any mole will occur in this list as an appearance followed by a disappearance, so we look for such pairs:    2  (la,ta,appear),(ld,td,disappear) log ,   ∈ final    history = (la,ta,max(td,ta + minwin)) l = l ,t < t , a d a d      (l j,t j, ) log final, l j = la (t j < ta td < t j),  ∀ ∈ ⇒ ∨ This condition allows for multiple moles in the same hole, as long as they do not appear simul-

taneously, which would not be compatible with intuitions about the game. The 3-tuple (la,ta,td) produced is the structure needed for a Mole Task, so it may be returned to an advisor as a local history.

44 The extra detail in forming the endtime, max(td,ta + minwin), is required because this value is not normally known to the hammer. If a mole is whacked, it disappears in that moment. The actual end of the time window can only be discovered if the mole is left unwhacked, which would lower the score if done intentionally. This is also why the endtime for the task is excluded from the Task

Similarity measure, simmole.

4.4.2 Handling Rules

Section 2.3.3 gave an abstracted condition for each rule type, leaving a total of 5 attributes,

timeout, preptime,clustervar, ata, prep(ta), ¬ to be defined at the problem specific level. These first 3 can be made the same for both FIFO and

NN, and they are sufficient to calculate both condig(s,d) and condproa(s,d).

timeout = 10 • current preptime(ta j) = dist(l ,l j) + 1 • start end current where ta j = (l j,t j ,t j ) and l is the hammer’s location.

clustervar(ta j) = 0 • Hammer failures are not a concern in our experiments. Future experiments many examine perfor- mance when such events occur, giving good reason to closely examine the timeout variable. Until then, timeout = 10 is sufficient.

The last value, clustervar(ta), may need to be tuned to suit the data. For example, if the clustering algorithm in the extract step of the advisor was tuned to allow more variation in individual clusters, a corresponding change would need to be done to clustervar(ta). In all the experiments done here it is not necessary to apply it because of the way the run instances are constructed (see Chapter 6), however, if there is more noise (similar, but not equal tasks) in the individual histories returned by agents, it would be necessary.

45 For the remaining 2 attributes ( ata, prep(ta)), these must be interpreted by the agents and con- ¬ hamm verted into actions or action sequences from the set Act = movel,whack,wait , which was { } the agent’s interface with the world. These attributes will be handled slightly differently for FIFO

and NN.

Ignore Rules and FIFO

Assuming condig(s,d) has been met, for some rule condig(s,d) ata , the ata will effect the → ¬ j ¬ j action chosen by gfifo indirectly. For gfifo, Ignore rules are applied to the queue after the queue A A is updated to reflect the situation, but before a decision is made based on the queue.

Ignore taj at lj

front li lj ... lk lh back front li ... lk lh back

fifo Figure 4.3: The application of an ignore rule, ata , to a queue for g . ¬ j A

start end The rule, ata to ignore the task ta j = (l j,t ,t ), will hide every occurrence of l j in the queue ¬ j j j prior to decision making, as illustrated in Figure 4.3. This is no problem for applying multiple

ignore rules simultaneously, as the action of filtering a list is commutative. We filter the queue here

instead of the situation so that the queue ordering will be preserved in case the rule expires before

the task is fulfilled.

After applying the rule, the agent’s normal decision process is left to determine the actual action,

with its perception altered.

Ignore Rules and NN

For gnn, the agent is more directly redacting the Sit it is given. All tasks matching the rule are A removed from the list of moles before distances are considered.

46 ... lj

...

Ignore taj at lj mole,j d

d d

mole,i mole,k mole,i

lmole li lk d lmole li

d mole,k

d d mole,h mole,h

lk

lh lh

nn Figure 4.4: The application of an ignore rule, ata , to a list of tasks distances for g . ¬ j A

start end Given an ignore rule ata for task ta j = (l j,t ,t ) and a situation, ¬ j j j

current current sit = (l ,t ,[li,i j,...,lk],hammers),

when this rule is in effect, all occurrences of ta j in moles = [li,i j,...,lk] will be excluded from consideration in the distance calculation.

This could be considered identical to the FIFO rule, except that the situation is filtered directly, instead of modifying the queue.

Proactive Rules and FIFO

Assuming condproa(s,d) has been met, for some rule condproa(s,d) prep(ta j), we will take a → fifo similar approach in finding an action sequence for g , as was done with ata . While there were A ¬ j other options to handle this rule (inserting a new location at the front of the queue, for example), the choice was made to overwrite the queue entirely, and give it only a single element. This is illustrated in Figure 4.5. This extremely simplifies all the definitions.

prep(taj) at lj

front li lj ... lk lh back front lj back

Figure 4.5: The application of an proactive rule to a queue for gfifo. A

47 start end When a proactive rule, prep(ta), for the task ta j = (l j,t j ,t j ) is in effect the following substi- tution will be made: queue [ta j]. The decision function proceeds normally afterward. ⇒ It is important to note that when any rule (proactive or ignore) is applied to the queue, the un- modified queue is still stored. When the rule’s effect is no longer active, the queue returns to its original state, with the original order intact. These changes made by rules can be thought of as virtual.

Proactive Rules and NN

A proactive rule, prep(ta j), for ta j will make ta j the only task given for the distance comparison, and then automatically the best available choice. This is illustrated in Figure 4.6.

lj

...

prep(taj) at lj mole,j d

d d

lmole mole,i li lmole mole,j lj

d mole,k

d

mole,h

lk

lh

Figure 4.6: The application of an proactive rule to a list of tasks distances for gnn. A

When this rule is in effect the following substitution will be made, moles [ta j]. Parallels to the ⇒ FIFO case should be obvious. We perform the same modification, but to the situation rather than the queue. The decision function proceeds normally afterward.

4.4.3 Rule Conflicts

Rules are applied in an order given by a list (see apply and applyAll defined in Section 4.4.4), and

in an unintelligent way, that technically resolves all conflicts.

48 ata at l prep(ta ) at l ¬ j j j j

front li lj ... lk lh back front li ... lk lh back front lj back

prep(ta ) at l ata at l j j ¬ j j

front li lj ... lk lh back front lj back front back

Figure 4.7: An Ignore rule conflicting with a proactive rule for the same task. This is the difference made by order of rule application.

Where 2 ignore rules ( ata and ata ) are applied, the order of application will not change the ¬ j ¬ k result.

fifo However, if 2 proactive rules (prep(ta j) and prep(tak)) are applied, using g as an example, A these overwrite the queue, so only the last rule applied will matter. Applying prep(tak) second would give queue = [lk], but if applied first queue = [l j]. We sidestep this issue by using the optional trigger tasks in the proactive rules, and a careful rule creation strategy on the part of the

EEIA. No more than one proactive rule should be in effect at a given time.

Finally, with a ignore rule and proactive rule ( ata and prep(ta j)), there could be issues. If they ¬ j were for different tasks the order would not matter, but when they are for the same task, order is

very important. This is illustrated in Figure 4.7. Applying proactive rules first results in an empty

queue, so to avoid this, ignore rules must be applied first.

This “empty queue” problem obviously also occurs in the nearest-neighbor case. For this reason,

both gfifo and gnn have been built to sort their rules, putting ignore rules first, and proactive A A after.

49 4.4.4 New Agents

Both gadnn and gadff (the gnn and gfifo, respectively, modified for advice) will use nearly the A A A A same method for applying rules, so many of the definitions are universal. The following functions will be used by both:

apply : Loc∗ Rule Loc∗ × →   list : rule condition not met  ta j  apply(list,rule ) = ta j [l j] : ruleta j is proactive    removeAll(list,l j) : ruleta j is ignore

applyAll : Loc∗ Rule∗ Loc∗ × →   list : rules = [] applyAll(list,rules) =  applyAll(apply(list,r1),[r2 ...,rz]) : rules = [r1,r2,...,rz]

gfifo will apply it to the queue, and gnn will apply it directly to the mole list in Sit, but otherwise A A they are the same.

Definition 4.4.1 (Advised FIFO Agent). A FIFO (first-in-first-out) Agent is a hammer agent

gadff = (Sitadff ,Actadff ,Datadff , f adff ), A Ag where:

Sitadff = Sithamm • Actadff = Acthamm • adff ∗ ∗ Dat = gNM Loc Rule Log - Map of the environment, queue of loca- • { } × × × tions, rules, and a log of events

50 The specific function is as follows: •

f adff : Sitadff Datadff Actadff Ag × →   wait : queue0  = []    adff current 0 fAg sit,dat = whack : l queue  ∈   movelnext : else

where queue = intersect(queueold,moles) + moles

and queue0 = applyAll(queue,rules)

and lnext = head(path(lcurrent,head(queue0)))

and sit = (lcurrent,tcurrent,moles,hammers)

old and dat = (gNM,queue ,rules,log)

Since rules turn themselves on and off for time-based reasons, this could interfere with the ordering of the queue in unpredictable ways. To avoid this effect, the queue in the agent’s Dat is still updated to queue as before (Dat is not updated to queue0, which is modified by rules).

Definition 4.4.2 (Advised NN Agent). A NN (nearest-neighbor) Agent is a hammer agent

gadnn = (Sitadnn,Actadnn,Datadnn, f adnn), A Ag where:

Sitadnn = Sithamm • Actadnn = Acthamm • adnn ∗ Dat = gNM Rule Log - Map of the environment, rules, and a log of events • { }× ×

51 The specific function is as follows: •

f adnn : Sitadnn Datadnn Actadnn Ag × →   wait : moles0  = []    adnn current 0 fAg sit,dat = whack : l moles  ∈   movelnext : else

where moles0 = applyAll(moles,rules)

and lnext = pathlcurrent,mindist(lcurrent,moles0)

and sit = (lcurrent,tcurrent,moles,hammers)

and dat = (gNM,rules,log)

The gadnn modification is even easier, with no queue to update each round. Both of these agents A will, however, update their logs according to Section 4.4.1. This is not explicitly written in the decision function, as it does not contribute to the choice of Act.

52 Chapter 5

Complex Application: Pickup and Delivery

In this chapter we apply the advisor to another application, Pickup and Delivery with Time Win- dows (PDPTW). PDPTW is a far more important problem than Whack-a-mole, since businesses are based on it [DC05][HH+02][WBH08]. Rather than acting as a metaphor for something useful it is a direct representation of an industrial setting. With its many variations, it captures public transit, postal services, taxis, the shipment of goods by long-haul trucks, and forklifts on a factory

floor. In all cases, efficiency relates directly to a real financial cost. For some, fuel spent is the concern, for others, drivers’ wages, or the number of vehicles required is more important.

Cochrane p p p 36km p82km p p LakeLouise Banff Canmore Calgary MedicineHat 58km 26km 111km 295km

Figure 5.1: Larger cities and towns on the Trans Canada, labeled with distances separating them.

Locations in PDPTW customer requests come in pairs, one for the source, and one for the des- tination (or departure and arrival). This pairing is the single largest deviation from the previous application, Whack-a-mole. It means the consequences of decisions made from partial knowl- edge will be hard to predict. Durations of commitments now last as long as the load (or person) is carried, instead of beginning and ending on the same time step (a single whack in Whack-a- mole). Risk and reward are both amplified. It should be noted that this change does not require any difference in the EIA or EEIA; these details are hidden with solution quality and task similarity measures.

This will be illustrated with an example about a hypothetical limousine service, operating on the

Albertan section of the Trans Canada Highway, Figure 5.1. This limousine service will travel the

53 Pickup Delivery Payment Start End Passengers Calgary Banff 0:00 4:00 6:00 1 LakeLouise Cochrane 0:00 8:00 11:00 1

Table 5.1: Customer requests made in advance, to allow planning.

Location Time Calgary 0:00 Driver sleeping Calgary 4:00 Pickup the customer Banff 5:30 At speed limit, less than 1.5h travel LakeLouise 6:10 About 32min to get to Lake Louise LakeLouise 8:00 Wait until the customer is ready Cochrane 9:40 Slightly over 1.5h travel for last delivery

Table 5.2: Based on the known-ahead requests, the driver can make a plan.

highway, and transport any received customer requests (also known as fares). Assume the driver

lives in Calgary and travels the speed limit, 110km/h.

In this service it is the norm for customers to reserve seating some time in advance, to ensure the driver can pick them up. Those customers that call at the last minute may or may not be able to get service that day, but the driver will make an attempt. Let Table 5.1 be the jobs where the customers have paid in advance, guaranteeing service.

From the available knowledge, the driver can form a plan for the day. With only 2 jobs, it is obvious in which order the jobs should be done; see Table 5.2. This plan is illustrated in Figure 5.2.

1:00 2:00 3:00 4:00 5:00 6:00 7:00 8:00 9:00 10:00 11:00 LakeLouise

Banff Canmore

Cochrane Calgary

MedicineHat

Figure 5.2: The plan, shown as the graph of the highway extended over the day (a space-time diagram). A blue line is used to represent the lifeline/trajectory of the vehicle.

We will use the term “fare” to refer to our customer requests. The fares are shown as parallelograms

54 1:00 2:00 3:00 4:00 5:00 Canmore

Calgary

Figure 5.3: An example fare, illustrating the actual time window. in time, stretching from the earliest possible path fulfilling the request, to the latest. As in Figure

5.3, these fares do not have nearly the flexibility that they appear to have based on the large time windows. What would normally be a rectangle, reaching from the earliest pickup in one corner, to the latest pickup in the opposite corner, must be “clipped” into a parallelogram. The exact angle of this clipping is the speed limit of the vehicle, and shows the fare becoming more difficult if the speed gets lower. The tail pictured on the fare in Figure 5.3, is to show the third time variable, indication when payment (equivalently, when knowledge) was received. This is simply a visual distinction between the dynamic and planned fares.

Consider now, the case where the driver begins to execute this plan and more jobs arrive dynam- ically. Leaving Calgary with the first fare at 4:00am, the driver arrives in Banff at 5:30am for the delivery. Deviating from the plan provides an opportunity to make money, so the driver does.

Upon arriving early (6:00am) in Lake Louise to pick up the 8:00am fare, the driver learns of a new request which can be fulfilled first. This fare is from Banff to Lake Louise, available at 6:15am, so the driver backtracks to Banff by 6:30am. This is a slight efficiency loss, in distance but not time, as the driver still ends up briefly idle after completing delivering this customer.

One more dynamic fare is called in, this time from Banff to Canmore, starting at 7:45am. The

7:45am fare being dynamic represents a slight time inefficiency, but not a distance inefficiency.

The remainder of the plan is delayed to fulfill it, while the vehicle had been idle earlier. Had fore- knowledge of this fare been available, the driver could have moved preemptively, to reach Banff by 7:45am. This somewhat inefficient path, and knowledge changing over time, is illustrated in

Figure 5.4, which shows the landscape of future expectations changing as time progresses. The

55 Pickup Delivery Payment Start End Passengers Calgary Banff 0:00 4:00 6:00 1 Banff LakeLouise 6:10 6:15 8:00 1 Banff Canmore 7:30 7:45 12:00 1 LakeLouise Cochrane 0:00 8:00 11:00 1

Table 5.3: In retrospect, all fares fulfilled, including dynamic ones.

4:00am frame shows only the planned tasks, but more appear dynamically overtime.

Only as a post mortem does the full list of requests, Table 5.3, become available. We see that although the time windows were met and all fares were fulfilled, there was possibility to do the same work in less time and less distance, had there been knowledge about all of them.

At 4:00 AM

At 6:10 AM

At 7:30 AM

At 8:20 AM

At 10:39 AM Figure 5.4: A sequence of requests being fulfilled by a single vehicle, some known in advance, some dynamic.

This fictional limousine service is a close real world representation of the particular variety of

PDPTW we are examining. Fares are mixed between those that were known ahead and those that

56 were picked up on-the-fly. It is precisely this partial knowledge of the future that makes it a good application domain. The next sections describe this PDPTW variety formally.

5.1 The Environment

Rather than environments of varying sizes, an 11 11 grid is used for PDPTW. It is sufficiently × complex for any experiments we are doing. This environment has a depot at the center vertex (l5,5), where all vehicles begin and end every run instance.

l0,0 l1,0 l2,0 l3,0 l4,0 l5,0 l6,0 l7,0 l8,0 l9,0 l10,0

l0,1 l1,1 l2,1 l3,1 l4,1 l5,1 l6,1 l7,1 l8,1 l9,1 l10,1

l0,2 l1,2 l2,2 l3,2 l4,2 l5,2 l6,2 l7,2 l8,2 l9,2 l10,2

l0,3 l1,3 l2,3 l3,3 l4,3 l5,3 l6,3 l7,3 l8,3 l9,3 l10,3

l0,4 l1,4 l2,4 l3,4 l4,4 l5,4 l6,4 l7,4 l8,4 l9,4 l10,4

l0,5 l1,5 l2,5 l3,5 l4,5 l5,5 l6,5 l7,5 l8,5 l9,5 l10,5

l0,6 l1,6 l2,6 l3,6 l4,6 l5,6 l6,6 l7,6 l8,6 l9,6 l10,6

l0,7 l1,7 l2,7 l3,7 l4,7 l5,7 l6,7 l7,7 l8,7 l9,7 l10,7

l0,8 l1,8 l2,8 l3,8 l4,8 l5,8 l6,8 l7,8 l8,8 l9,8 l10,8

l0,9 l1,9 l2,9 l3,9 l4,9 l5,9 l6,9 l7,9 l8,9 l9,9 l10,9

l0,10 l1,10 l2,10 l3,10 l4,10 l5,10 l6,10 l7,10 l8,10 l9,10 l10,10

Figure 5.5: The standard pickup and delivery environment.

This 11 11 grid is transformed into a graph g11 = (V,E),E V V as before, with edges between × ⊆ × those vertices that are adjacent. However, the space is now continuous, allowing vehicles to be located on edges in addition to the vertices. Edge lengths are now Euclidean rather than uniform

(diagonals have weight √2). The location of a vehicle shall be defined as follows:

Loc = E [0,1] ×

57   Where (li,l j),0 would indicate that a vehicle is completely on li, while (li,l j),1 would indicate  that a vehicle is completely on l j, and (li,l j),0.5 would be the middle point. This leaves multiple representations of the same point, the simulator is responsible for detecting equivalences.

There are many additional, usually arbitrary, choices that must be made when constructing a Pickup and Delivery simulator. We use the SENSES simulator, which was developed for [Ka10]. It has been used for the EIA experiments in Pickup and Delivery in the past: ([SD+10], [SD+11],

[KB+10], [HD+10], [KBD09]). The following rules apply to the SENSES simulator:

Vehicles exist at any vertex, or any point on an edge. • All vehicles start and end every run instance at the depot. • Fares are picked up and delivered on vertices only. • Edges have weight according to Euclidean distance (vertical and horizontal are • weight 1, diagonal are weight √2).

All vehicles move at a speed no more than 1 distance per 1 time step. • Strict vehicle deadlock is enforced. No 2 vehicles can occupy the same vertex or • edge.

A fare is a commodity to be moved, and has an amount of that commodity, a pickup • location, a delivery location, an earliest pickup time, and a deadline.

When a vehicle is co-located with a fare’s pickup location, within the time window, • and the pickup action is performed, that fare’s commodity is moved to the vehicle’s

inventory.

When a vehicle is co-located with a fare’s delivery location, within the time window, • and the vehicle’s inventory contains that fare’s commodity, and the delivery action

is performed, that fare’s commodity is removed from the vehicle’s inventory.

58 When the specified amount of commodity is moved from the pickup location to the • delivery location for a given fare, the fare is marked as complete.

A vehicle may only carry one commodity at a time. • If a vehicle cannot find work to do, it should return to the depot, otherwise, it should • pursue the fares.

A problem (set of fares) is not considered complete until all vehicles have returned • to the depot.

The EEIA depends on none of this, which should be apparent when comparing the Pickup and

Delivery conventions with the Whack-a-mole conventions. The task structure, having multiple

locations, rather than single locations, is the most significant difference. Another is that PDP is

continuous, while Whack-a-mole is discrete. The distance measures are also different, counting

edges, versus Euclidean.

5.2 Dynamic Task Fulfillment Perspective

Tasks will be the customer requests, or fares. This makes a Run Instance, essentially, the set of

tasks completed in a day. And, then a full run represents weeks or months of work. Recurring tasks

throughout a run will be those that occur in the same place and time, each day for a sufficiently

long period.

Definition 5.2.1 (PDP Task). A PDP task is defined as:

PDP Task = R V V Time Time × × × ×  So, ta = w,lpickup,ldelivery,tstart,tend .

This captures the amount to be transported (w), the locations for pickup and delivery, and the time window. For the particular specialization of PDPTW we are using, this is sufficient. For other specializations, other attributes may be necessary.

59 5.2.1 Task Similarity

Unlike Whack-a-mole, knowledge about the full time window is made available to the vehicle.

Both the start and end times are used in comparison, as are both locations. Time windows exist in these tasks, but they have not been emphasized, as we will be focusing of other behaviors in experiments (see Chapter 6).

Definition 5.2.2 (PDP Task Similarity). Given a pair of PDP Tasks is defined as:

PDP PDP PDP sim :Task Task R × →  start end  start end w1,lp,1,ld,1,t1 ,t1 , , w2,lp,2,ld,2,t2 ,t2

αdist(lp 1,lp 2) + αdist(ld 1,ld 2) + β w1 w2 7→ , , , , | − | where dist(li,l j) is the shortest path length found by Dijkstra’s algorithm.

The constants α = 0.3 (tolerance in distance), β = 0.1 (tolerance in weight) are used in experiments

(see Chapter 6).

5.2.2 Optimization Criterion

The EIA and EEIA will perform the optimization using one of the problem-agnostic optimizers

(see Section 2.3.2 and Appendix A). For PDPTW, a genetic algorithm in chosen, however the branch and bound was another possibility. Both work in a way that does not acknowledge low level details (for example, PDPTW versus Whack-a-mole). A solution quality measure qual :

Solution R, is the one and only component that the advisors require to do this, and it will hide → (abstract) all problem-specific detail.

This will be where all details of the simulator are encoded:

Speed limits • Distance (Manhattan, Euclidean, Manually specified, ...) •

60 Other parameters are treated as global constants:

Number of vehicles • Number (and detail) of tasks • Layout of environment graph • While these things can vary from experiment to experiment, these are always constant throughout single optimizations.

Running a simulator easily produces statistics such as distance and time, but simulators are rela- tively slow. More importantly, our simulator only produces new sets of assignments. It does not evaluate solutions except when it produces them. Specifically, it only shows how the tasks can be handled by a strategy (like FIFO, NN, DIC, or some other, with or without advice), it does cannot execute arbitrary solutions (assignment lists). Many of these arbitrary assignment lists describe impossible things, like breaking speed limits, or being in 2 places at once, and are incompatible with the simulator.

The optimization criterion (qual function, see Section 2.3.2) needs to evaluate solutions directly, adding up distances and applying constraints. Impossible things can be detected and penalized by qual, via geometry, and be handled in ways the simulator is not able.

The particular qual function described in this section (and used in Chapter 6) calculates the final completion time of a set of assignments.

Consider a list of assignments, sol, given as argument to qual. These are assumed to be sorted first

61 by assigned vehicle, second by pickup time.

sol = [(ta1, g0,t1), A

(ta2, g0,t2), A

(ta3, g0,t3), A . .

(tak 1, g1,tk 1), + A +

(tak 2, g1,tk 2), + A +

(tak 3, g1,tk 3), + A + . .

]

Every assignment, (tai, gi,ti), will be checked against the following criteria: A start end Where tai = (w,lp i,ld i,t ,t ), validity requires: • , , i i

start end t ti < t i ≤ i

Failing this requirement marks an assignment as late.

Where i > 1 and tai and tai−1 are assigned the same vehicle, validity (on the latter) • requires:

ti ti−1 > dist(ld i−1,lp i−1) + dist(lp i,ld i−1) + 2 − , , , , The +2 adjustment represents the 2 service actions (pickup and delivery), which re-

quire a time step each. Failing this requirement marks an assignment as impossible

(distance was traversed faster than 1 per time step).

Where i = 1 or tai and tai−1 are not assigned the same vehicle, validity (on tai) • requires:

t > dist(l ,ldepot) + 2 i i Ag j

62 Where g j is the vehicle assigned to tai. Failing this requirement marks an assign- A ment as impossible (distance was traversed faster than 1 per time step).

This way, each assignment is evaluated as valid,late,impossible . In case of multiple conditions { } met or failed the following acts as tie breakers:

If an assignment is not late or impossible, it is valid by default. • If an assignment is both late and impossible, impossible takes priority. • If multiple assignments exist for the same task, all are kept and included in the score • (all will influence planning).

For each late or impossible task a penalty is given to the score. Although in most cases it would make sense to penalize one more than the other, the PDP experiments will not have as big of an emphasis on time windows as the Whack-a-mole experiments did. Both late and impossible assignments will result in a penalty of 10000 each. This is a large enough penalty that a single violation removes the plan from consideration when optimizing.

With penalties decided, consider a subset of sol:

 finalas = (tai, gi,ti) (tai, gi,ti) sol, (ta j, g j,t j) sol, gi = g j,t j > ti A A ∈ ¬∃ A ∈ A A

In other words, the set finalas contains the final assignment for each vehicle (assuming no impossible concurrent assignments on a single vehicle). Each of these does mark a time for the vehicle outside of the depot, but still does not include

1. time required to cross the distance from pickup location to delivery location for the

final task

2. time required to cross the distance from delivery location to the depot

63 Adding these final costs for each vehicle we can define the quality function:

PDP qual :Solution R →   sol max ti + dist(lp,i,ld i) + dist(ld i,ldepot) + 1 tai, gi,ti finalas 7→ , , A ∈ + ∑ penalty(as) as∈sol start end where tai = (x,lp,i,ld,i,ti ,ti )

This will return the time at which the final vehicle reaches the depot after all tasks are completed.

This is specifically the final completion time for the run instance. It should not be confused with the total drive time, or the accumulated “billable hours” of the drivers.

5.3 Agents

The SENSES simulator developed for [Ka10] is reused for this chapter, as well as the experiments.

It is a multi-agent pickup and delivery simulator, which follows the definitions presented so far. It has the additional feature of a Digital Infochemical Coordination (DIC) simulation, which allows agents to emit and sense infochemicals, and calculate the way these infochemicals propagate and evaporate. This has allowed situation and communication infrastructure to be abstracted to the level of infochemicals.

Definition 5.3.1 is what an agent could be built upon for PDPTW if the choice was made not to use

DIC. Rather than using Definition 5.3.1 as the interface for agents, we use Definition 5.3.2, which takes the infochemical infrastructure for granted.

Definition 5.3.1 (General PDP Agent). A Vehicle Agent is an agent

gVeh = (SitVeh,ActVeh,DatVeh, f Veh) A Ag where:

64 SitVeh = Loc Time Load∗ Task∗ Message∗ - At each time step, the vehicle • × × × × is given complete information for whichever tasks are visible to it. It will also

be informed of its current time, location, contents of its inventory, and received

messages. Here, Load = Task R, indicating a task (and amount) held in inventory. × Veh Act = movel, pickupta,deliverta,sendx - Each action is parameterized: • { }

– movel - l Loc indicates a new location (should be less than dis- ∈ tance 1 away from current location).

– pickupta - ta Task indicates a commodity to load into the vehicle’s ∈ inventory (should be an active task at the vehicles current location).

– deliveryta - ta Task indicates a commodity to unload from the ∈ vehicle’s inventory (should be an active task at the vehicles current

location, and a commodity in the vehicle’s inventory).

– sendx - x Message is a message to be sent to other vehicles. ∈ DatVeh - determined according to the chosen PDP solution. • f Veh : Sit Dat Act - determined according to the chosen PDP solution. • Ag × → The dynamic solution chosen for the pickup and delivery experiments is Digital Infochemical

Coordination (DIC), based on [KDB08] and [KBD09]. In practice, DIC is very similar to nearest neighbor, except that it creates slightly more cooperative behavior when multiple agents work together.

This works by simulating the propagation of semiochemicals (also known as infochemicals) through the air, emitted by various entities in the environment. Infochemicals will be stronger or weaker from the perspective of the agents depending on how far the agents are from the source. Some in- fochemicals will attract agents while others will repel. Some infochemicals will be completely ig- nored at times, depending on the internal state of the agent. Propagating a variety of infochemicals

65 l0,0 l1,0 l2,0 l3,0 l4,0 l5,0 l6,0 l7,0 l8,0 l9,0 l10,0

l0,1 l1,1 l2,1 l3,1 l4,1 l5,1 l6,1 l7,1 l8,1 l9,1 l10,1

l0,2 l1,2 l2,2 l3,2 l4,2 l5,2 l6,2 l7,2 l8,2 l9,2 l10,2

l0,3 l1,3 l2,3 l3,3 l4,3 l5,3 l6,3 l7,3 l8,3 l9,3 l10,3

l0,4 l1,4 l2,4 l3,4 l4,4 l5,4 l6,4 l7,4 l8,4 l9,4 l10,4

l0,5 l1,5 l2,5 l3,5 l4,5 l5,5 l6,5 l7,5 l8,5 l9,5 l10,5

l0,6 l1,6 l2,6 l3,6 l4,6 l5,6 l6,6 l7,6 l8,6 l9,6 l10,6

l0,7 l1,7 l2,7 l3,7 l4,7 l5,7 l6,7 l7,7 l8,7 l9,7 l10,7

l0,8 l1,8 l2,8 l3,8 l4,8 l5,8 l6,8 l7,8 l8,8 l9,8 l10,8

l0,9 l1,9 l2,9 l3,9 l4,9 l5,9 l6,9 l7,9 l8,9 l9,9 l10,9

l0,10 l1,10 l2,10 l3,10 l4,10 l5,10 l6,10 l7,10 l8,10 l9,10 l10,10

Figure 5.6: The infochemical gradient. The vehicle’s infochemicals are in red, the pickup loca- tion’s infochemicals are in green, and the delivery location’s infochemicals are in blue. The depot’s infochemicals are omitted from the diagram.

in this way creates a gradient that the vehicles can use to navigate, illustrated in Figure 5.6.

Agents will still experience the strongest attraction to the tasks nearest to them, but they will also

weakly repel each other, making them less likely to crowd around the same tasks, and more likely

to spread out. This represents a slight advantage over NN, as it gives agents a way to organize and

coordinate. Since these infochemicals establish a gradient, they are used for path-finding as well

as deciding the execution order of tasks.

The simulated concentration is the most important of many factors used when an agent is deciding

which infochemical to follow. Other such factors are incorporated into what is called the utility

calculation, described in full in [KDB08]. The utility of an infochemical is primarily based on its concentration, however the value is manipulated to create other effects. For example, when an agent follows the same infochemical for several time steps, its utility is amplified, increasing

66 its priority. When another agent is detected pursuing the same task, and will reach it sooner, the

utility is diminished (completely). The important thing to know is that setting the utility of an

infochemical to 0 essentially erases it from the agent’s perception.

Definition 5.3.2 (DIC Vehicle Agent). A DIC Vehicle Agent is an agent

gDIC = (SitDIC,ActDIC,DatDIC, f DIC) A Ag where:

SitDIC = Time Loc Load∗ In f ochemical∗ • × × ×

The current time, current location, current inventory (Load = Task Z), and all × perceived infochemicals.

ActDIC = ActPDP • DatDIC - Description of the environment and last infochemical followed • f DIC : Sit Dat Act For sit = (lcurrent,tcurrent,inv,in f o), perform the calcula- • Ag × → tions:

1. If inv contains a task with delivery location matching lcurrent, per-

form delivery.

2. If inv is empty and in f o contains a task with pickup location match-

ing lcurrent, perform pickup.

3. If no task infochemicals are detected, follow the depot infochemicals

depot (movel for neighboring l V, where sem (l) is maximized). ∈ 0 4. Else, follow the infochemical, sem , with max utility (movel for neigh- boring l V, where sem0(l) is maximized). ∈ This gDIC is an agent that will essentially perform gradient descent on infochemicals. The gra- A dient is decided by the utility calculation, which incorporates other factors such as the location of

67 other agents and the last infochemical followed. It will also change depending on what is present

in the vehicle’s inventory.

5.4 Required Modifications of Agents for Working with An Advisor

Like was done in Section 4.4, the agent gDIC will be modified to allow advice from the EEIA. A This requires 2 new abilities, discussed separately:

1. Gathering and returning histories.

2. Accepting and following rules.

5.4.1 Providing Histories

This is far simpler than the Whack-a-mole case. While there is still a need to keep a log on the

agent side, actual tasks are included in the situation, so no forensics is required.

Given a log (as part of the agent’s Dat), at each new time step, the situation is as follows:

sit = (lcurrent,tcurrent,inv,in f o)

Each semta in f o will contain the information about its associated task, making it possible to ∈ convert in f o tasks. Then these detected tasks can be added to the log: 7→

lognew = logold tasks ∪

The same tasks will be observed this way for multiple timestamps until they are picked up, or expire. Multiple log entries for the same task in a single run instance would not be useful, so any identical tasks in this log are eliminated at this stage. The resulting log can be returned directly to the EEIA, with all information required to learn recurring tasks.

68 5.4.2 Handling Rules

The technique for applying rules to the DIC agent should seem very familiar. It is not quite identical

to the one used in Section 4.4.2, but is done in the same spirit.

The situation perceived by the agent is modified, then the agent follows its normal decision making

process based on the filtered reality. For gDIC, this can be done entirely through the utility A calculation.

tak tak

Ignore taj at lj

taj taj

tai tai

Figure 5.7: The application of an ignore rule to a utility calculation for gDIC. A

When applying an active ignore rule for a task, ta j, the utility of that task’s pickup infochemicals is set to 0, as is done when the vehicle’s capacity is full, or if another vehicle will reach the task first. It

is simply eliminated from consideration. This is illustrated in Figure 5.7, where the infochemicals

emitted from the pickup location of ta j are omitted from the calculation, despite the fact that the task still exists unfulfilled.

When applying an active proactive rule for a task, taq, a whole new infochemical must be created, whether that task really exists or not. The rule will set the utility of every infochemical (except this new one) to 0. All other distractions are eliminated from the situation of the agent, and only the new infochemical is visible, giving it only one choice. This is illustrated in Figure 5.8.

69 tak tak

prep(taq) at lq

taj taj

tai tai

Figure 5.8: The application of a proactive rule to a utility calculation for gDIC. A

5.4.3 Rule Conflicts

The rules application for DIC could certainly cause something comparable to the “empty queue” problem (see Section 4.4.3). gDIC has a slightly more elaborate solution to this than gFIFO and A A gNN did, but the effect is the same. A When a proactive rule is in effect for gDIC, the ignore rules are simply not calculated. This has A an identical effect to sorting the list, as proactive rules completely overwrite the prior calcula- tions.

Ignore rules do not conflict with each other, as before. They can simply be applied in any number and order, with the same effect. Proactive rules will conflict, though. To solve this, the gDIC A calculates a suitable proactive rule (based on expiration) if multiple proactive rules are active at once.

The EEIA’s rule construction will use chaining (see Section 3.2), and should simply not create rule sets where this is necessary, but the ability to resolve these problems is in place if they occur.

70 Chapter 6

Experiments

This chapter presents the experimental results of applying an advisor to both of the previously dis- cussed application domains. While it should be clear how an advisor applies to all specializations of DTF, it remains to be shown that it can have the same effectiveness in different settings. Ideally we will see the same improvements in efficiency no matter where the EIA or EEIA is applied.

The motivation behind extending the EIA to create the EEIA was primarily to make use of cer- tain prior knowledge. Intuition about these problems indicates that any knowledge not applied will come at the cost of performance, so certain knowledge, regardless of source should be in- cluded.

Verifying that this has been achieved has been done over 3 series of experiments. First, the Degree of Dynamism experiments show how well the amount of knowledge correlates with efficiency.

Second, the Static Variant experiments show how well the advised agents compete with solutions to these same problems when they are statically optimized. Third, to show that the EEIA is an im- provement over the EIA, the advisor is placed in its natural environment, with Full Runs containing actual recurring tasks that can be detected and applied in planning.

Each of these 3 experimental series will be applied to Whack-a-mole first. Under each, the Whack- a-mole subsections will show both the Nearest Neighbor (NN) strategy with 2 hammers, and the

First-in-first-out (FIFO) strategy with 1 hammer.

The experiments will each then be repeated with 2 vehicle Pickup and Delivery, using Time as an efficiency measure.

Finally, because the Multi Agent System community is rarely satisfied by only 2 agents, higher

71 numbers of agents will be given their own subsection for the final (full run) experimental series.

This will be done using Pickup and Delivery as the specific problem domain.

When adding more agents, and measuring efficiency based on time, a problem does arise. The

score of the purely dynamic (unadvised) system actually gets closer to the optimal. Consider an

extreme case where there is one vehicle per task. The problem can be solved in near optimal

time with no calculation by simply deploying all vehicles, even though this should not be seen as

efficient behavior.

To address this problem, for higher numbers of vehicles, the efficiency measure will be changed to

distance. And, this has the added benefit of further displaying the generality of the advisor.

6.1 Generating Base Run Instances

How random run instances are generated can have an effect on the results. All dynamic strategies

(including NN, FIFO, and DIC), will have special cases, where they reach an optimal solution,

or come very close. Depending on the quality measure used, these can be very frequent but are

usually very rare.

The run instance generation procedures here are tuned specifically for cases where the emergent

solutions are of poor efficiency. These procedures are used in all of the following experimental

setups, any place a random task or random run instance is created.

6.1.1 Whack-a-mole

All experiments used in the Whack-a-mole analysis will use the same technique for generating

random run instances. The environment size will vary, with small environments being 3 3, and × large ones being 9 5 or larger. This is how the run instance size will be decided. × It could have been completely possible to put any number of moles in random locations on the

72 graph, but we have decided to place exactly one on each vertex of the environment graph. Multiple

moles could exist in the same hole, but not if their time windows overlap. Random generation

within this constraint is simply easier when moles each have their own hole.

The choice of these time windows involves some tuning, so we will specify the following:

Parameter Name Description Recommended Value

spread The allowed time range, [0,spread], for tstart 30 FIFO (50 NN)

inv The average interval size 12 FIFO (5 NN)

inv var The variance allowed on an interval 5 FIFO (3 NN)

Random1 mole tasks are generated using 2 random integers (tstart,tend), for a location, l.

tstart [0,spread] ∈ tend tstart + inv inv var,tstart + inv + inv var ∈ − molerandom = (l,tstart,tend)

Then the resulting event would be defined ev = (l,tstart,tend),tstart.

Then for an N M world, we iterate this formula N M times (a tstart and tend for each location × · i, j i, j li, j), and generate a new random run instance:

n  start end o ri = li j,t ,t i [0,N), j [0,M) , i, j i, j ∈ ∈

Uniqueness is important in a run instance construction and is provided by the choice of exactly 1

mole per location.

Of course, the parameters have an enormous effect here. Choosing hit-miss ratio as a quality

measure means that when inv is large enough, all moles will be eventually whacked, even by the

FIFO strategy. We do run the risk of not being able to distinguish between FIFO, NN, and the

statically optimized solution in terms of their performance. For this reason, the parameters must

1Randomness generated from the JavaScript Math library, in Firefox, Chrome, Safari, and NodeJS. Experiments have been run on a variety of systems.

73 be tightened. Section 6.2 shows the wedge between dynamic and optimal performance created by

the current parameters.

6.1.2 Pickup and Delivery

Unlike in Section 6.1.1, we will be holding the environment size constant and generate the num-

ber of tasks we want in a spatially random way. Our N M graph will be strictly an 11 11 × × graph.

Random2 PDP tasks are generated using random integers within specified intervals, as was done with Whack-a-mole. Without the shortcut of generating tasks on separate deterministic locations, we need to establish constraints, and re-generate random tasks that violate them.

The weight for all tasks is set to 20, the vehicle capacity is also 20 (making these experiments unit- weight). The simulator itself enforces that an agent cannot perform multiple simultaneous tasks, however, it allows agents to perform larger tasks in multiple trips if the weight is higher than agent capacity. We have removed that complication for all experiments described here.

start end All other parts of tarandom = (w,lp,ld,t ,t ), will vary according to random number generation. Again, this involves a set of tunable parameters:

Parameter Name Description Recommended Value

spread The allowed time range, [0,spread], for tstart. 120

inv The average interval size 180

inv var The variance allowed on an interval 0 Unlike in the Whack-a-mole case, we are optimizing plans and judging solutions based on time

required. It is assumed that no tasks will be missed. The distinction between the best and worst so-

lutions will be thoroughly established based on time, no matter how easy these large time windows

appear to make the problem.

2Randomness generated from the Java Random class, in multiple versions of Java (all in 1.8.*), on a variety of operating systems.

74 The following calculation is performed:

xp [0,N] ∈

yp [0,M] ∈

xd [0,N] ∈

yd [0,M] ∈

lp = lxp,yp

ld = lxd,yd

tstart [0,spread] ∈ tend tstart + inv inv var,tstart + inv + inv var ∈ − random start end ta = (20,lp,ld,t ,t )

A constraint is applied to the random task:

lp = ld • 6 If this constraint is violated, the random task is discarded and regenerated (repeatedly if necessary),

to find one that complies.

To generate a random base run instance, bri RunInstance with m tasks, we iterate. For each ∈ randomly generated task, tarandom, a constraint is applied:

pdp for each tai bri, sim (tai,tarandom) > 0 • ∈

When a tarandom violates this constraint is discarded and regenerated (as many times as necessary). This is meant to guarantee no duplicates in bri, and is specifically a uniqueness of Tasks not only of Events. Two announcements (at different times) of the same task would be distinct events, but the same task, leaving the run instance one task shorter than intended.

Iterating to completion means that for a run instance, the base run instance will be generated with

bri = m. It will have tasks spaced in a roughly uniform, though unpredictable, way. And, it will | | 75 have time windows on all tasks large enough that deadlines are irrelevant.

6.2 Degree of Dynamism

Degree of Dynamism refers to the relationship between the amount of foreknowledge, and the resulting performance (originally in [LMS02]). The expectation is more knowledge means better performance, because it allows for planning. Systems using partial knowledge properly should display correlation between efficiency and knowledge.

The EEIA’s abilities are dependent on the knowledge available to it, whether that be certain knowl- edge of known-ahead tasks, or learned knowledge from recurring tasks. The experiments in this section are meant to reveal the relationship between the amount of knowledge and the performance gain.

We measure the amount of knowledge by the number of tasks known. Consider an example where

bri = 4, a very small run instance. The task collection will be bri = ta0,ta1,ta2,ta3 . Depending { } on the desired size, the subset known bri could have any of the following values: ⊆

76 ta , ta { 0 1}

ta ta , ta ta , ta , ta { 0} { 0 2} { 1 2 3}

ta ta , ta ta , ta , ta ta , ta , ta , ta ∅ { 1} { 0 3} { 0 2 3} { 0 1 2 3}

ta ta , ta ta , ta , ta { 2} { 1 2} { 0 1 3}

ta ta , ta ta , ta , ta { 3} { 1 3} { 0 1 2}

ta , ta { 2 3}

|known| |known| The columns drawn here in the subset lattice represent the available values for |bri| . If |bri| = 0, we know known = and there is no ambiguity in which subset we could use. Likewise, |known| = ∅ |bri| |known| 1 gives known = bri, and there is also no ambiguity. However, when |bri| = 0.5, there are multiple choices:

ta0,ta1 , ta0,ta2 ,... { } { }

In a simplistic way these choices do represent the same amount of knowledge, but that does not mean they are going to have exactly the same effect on efficiency. The amount of knowledge alone is not all that matters, though. The emergent solution will have different performance based on the number of tasks known, but it also depends on which tasks. Choosing known bri is ⊆ another opportunity to cherry-pick details that produce favorable results in experiments, so must |known| also be done randomly. Better yet, multiple possibilities should be chosen for each level of |bri| , attempting to capture the variance. For bri = 4, there are only 16 possibilities, and every single | | one can be tested. That is not feasible as bri increases: the subset lattice has size 2|bri|. | | |known| This ratio, |bri| , is formally called the Degree of Dynamism in literature [LMS02].

77 The experimental setup is as follows. Given a run instance, bri, generated according to Section

6.1, we perform the following:

1. Select a random subset: known bri ⊆ 2. Provide known to the EEIA in the merge step.

3. Run the simulation using the rules derived, obtaining a solution, sol.

 |known|  4. Record the performance as a pair |bri| ,qual(sol) .

These steps are iterated using the same bri but different random known each time. For the experi-

ments in this section, we generate 100 random subsets for each bri.

An important distinction to make is that these are individual run instances and not a run. There is

no opportunity for learning in this experiment. receive, transform, and extract provide precisely

0 knowledge, and all knowledge that does come in is provided in the merge step.

In our results, we see a near-correlation between knowledge and performance. However, this is only beyond some minimum knowledge size set, and minimum problem size. After about 20% of tasks are known, in large enough run instances, the correlation is clear. To measure this, we use the |known| Pearson correlation coefficient, including only data-points where |bri| > 0.2. This coefficient is + represented by the symbol: ρ20.

6.2.1 Whack-a-mole

In these experiments each data point is a single run instance, and in the scatter-plots one dot is also

the quality of a single run instance. First, we examine how well a single FIFO hammer performs

in this experiment.

Several things about Figure 6.1 are immediately apparent. First, for larger worlds (4 5 or larger) × the correlation between knowledge and performance is very clear, though it is not quite determin-

istic. Where the dotted red line shows the performance when no rules are applied, it can be seen

78 9 15 16 Hits Hits Hits 0 0 0 0 Known Tasks 9 0 Known Tasks 20 0 Known Tasks 25

(a) 3 3 world. ρ+ = 0.631. (b) 4 5 world. ρ+ = 0.861. (c) 5 5 world. ρ+ = 0.945. × 20 × 20 × 20 Figure 6.1: The Degree of Dynamism experiment performing on varying problem sizes, using single hammer FIFO under advice (using Branch-and-Bound). Each figure is a single bri. Red line indicates performance without advice. that not all trials scored above this.

In the cases where very low knowledge was used, it is clear that it is possible to harm performance when applying knowledge. Where the world is very small (3 3) the FIFO and optimal solution × qualities differ by only a single point. Meddling with the dynamic solution in this case should not be expected to improve it, because there is very little room for improvement.

In larger worlds (4 5 or larger), in the middle ranges where |known| 0.5, the performance dif- × |bri| ∼ ference can be very large. The EEIA can be relied upon in these cases to improve performance rather than hurt it in the average case, but there will still be the occasional bad day. Overall, the application of knowledge is helpful.

Despite the new uncertainty introduced in the multi-hammer case (deadlock and estimated op- timal), in Figure 6.2, we see the same correlation between knowledge and performance. This performance tends to be weak in the case of smaller worlds like the 4 4. This primarily comes × from the NN solution working well in those cases even without advice. When advice is given, and rules are created, the solution can only potentially rise to the performance of the optimal score.

If the NN has already achieved the optimal score(which it often does on the 4 4 world), adding × rules and tampering with an optimal solution can only hurt performance. The 4 4 world results × in Figure 6.2 can be expected if the dynamic solution does poorly. To show the level of variation

79 16 20 21 Hits Hits Hits 0 0 0 0 Known Tasks 16 0 Known Tasks 20 0 Known Tasks 24

(a) 4 4 world. ρ+ = 0.555. (b) 5 4 world. ρ+ = 0.838. (c) 6 4 world. ρ+ = 0.881. × 20 × 20 × 20 Figure 6.2: The Degree of Dynamism experiment performing on varying problem sizes, using multi hammer NN under advice (using GA). Each figure is a single bri. Red line indicates perfor- mance without advice. that can result from different choices of bri, the 5 4 experiment for multihammer NN has been × repeated 3 more times (see Figure 6.3).

For the larger worlds, the dynamic NN more consistently falls short of the estimated optimal score

(which, itself, falls short of the exact optimal score), so the gains from applying rules will also be reliable. 17 21 17 Hits Hits Hits 0 0 0 0 Known Tasks 20 0 Known Tasks 24 0 Known Tasks 24

(a) 5 4 world. ρ+ = 0.658. (b) 5 4 world. ρ+ = 0.873. (c) 5 4 world. ρ+ = 0.739. × 20 × 20 × 20 Figure 6.3: The Degree of Dynamism experiment repeated on the 5 4, using multi hammer NN under advice (using GA), with 3 different choices of bri. Each figure× is a single bri. Red line indicates performance without advice.

80 0 Known Tasks 10 0 Known Tasks 30 0 0 Performance Performance 15 42

+ + (a) 10 tasks. ρ20 = 0.489. (b) 30 tasks. ρ20 = 0.879. Figure 6.4: The Degree of Dynamism experiment performing on varying problem sizes, using DIC under advice (using GA). Each figure is a single bri. Red line indicates performance without advice.

6.2.2 Pickup and Delivery

The concept of a degree of dynamism applies to PDP as it did with Whack-a-mole. The exper- imental design is identical but the formatting of results have been modified slightly to keep the presentation the same. The results here are presented in terms of percent worse than optimal, where lower values are preferred (contrasted against maximizing the number of hits). The graphs in Figures 6.4 and 6.5 have been inverted so that, positive is down. This allows us to continue with the convention that better is up. We also continue to display each run instance as a single dot in the scatter-plots.

Transforming the data in this way reveals the same near-correlation seen with Whack-a-mole.

For each graph the choice of bri is a constant, but the choice of known bri is varied. One point ⊆ in the graph represents the same single run instance, and with a different choice of known provided

to the EEIA at the merge step, with no additional recurring knowledge.

Figure 6.4 shows similar effects to those seen in the Whack-a-mole case. We have translated the original result of the simulation (which was final completion time), into percent worse than

81 optimal, which is referred to in these (and other) figures by the term: Performance. For smaller run instances (10 tasks) adding knowledge does result in an improvement in efficiency, however the gain is meager and inconsistent.

For larger run instances (30 tasks), there is a stronger linear correlation between knowledge and efficiency. To show that the results in Figure 6.4 are typical for 30 tasks, the experiment has been run 5 additional times for different random bri in Figure 6.5. Though this choice of base run instance does matter, and some are better than others, we are still seeing strong correlation in all cases.

82 0 Known Tasks 30 0 Known Tasks 30 0 0 Performance Performance 50 45

+ + (a) 30 tasks. ρ20 = 0.901. (b) 30 tasks. ρ20 = 0.907. 0 Known Tasks 30 0 Known Tasks 30 0 0 Performance Performance 39 41

+ + (c) 30 tasks. ρ20 = 0.811. (d) 30 tasks. ρ20 = 0.917. 0 Known Tasks 30 0 Performance 40

+ (e) 30 tasks. ρ20 = 0.914. Figure 6.5: The Degree of Dynamism experiment repeated for 30 tasks in Pickup and Delivery with time measurement, with 5 more choices of bri. Each figure is a single bri. Red line indicates performance without advice.

83 6.3 Static Variant

The degree of dynamism has shown that as long as the knowledge set is larger than some minimum

threshold, adding more knowledge helps performance. It should be pointed out that the data points

in the degree of dynamism experiments where full knowledge was given (where known = bri),

should have near optimal efficiency in all scatterplots. However, it was not exactly optimal ev-

ery time. This raises the question: How close to optimal is the advised system with full knowl-

edge?

In order to be competitive with other solutions (in particular, those which also apply knowledge

for performance gains), we must be able to do well when complete knowledge is available.

Branch and bound will provide an exact optimal solution, and the genetic algorithm will usually

provide a very good solution. The problem is that having the best plan still means nothing if the

system under advice cannot execute the plan as described.

This experimental series will explore exactly that question. Trials will be generated as follows:

1. Choose a size: (N M for Whack-a-mole), with m tasks (input parameters of the × experiment)

2. Generate a random base run instance, bri, (as described in Sec. 6.1) according to

these values

3. Provide the entire bri as static knowledge to EEIA at the merge step

4. Simulate with rules from the optimal (or near optimal) plan, solopt, obtaining solsim

5. Record the data point m,qual(solopt),qual(solsim)

These steps are iterated, generating a data set. Where the plan has been followed we would expect to see qual(solopt) = qual(solsim). If deviations from the plan occur, the quality difference will ideally be low.

84 Size Trials Hits Relative to Plan 3 3 1000 0 0 3 × 4 1000 0 ± 0 4 × 4 1100 0 ± 0 5 × 4 1003 0 ± 0 × ± Table 6.1: Advised single hammer FIFO in the static variant of Whack-a-mole, quality relative to a plan.

Size Trials Hits Relative to Plan 3 5 1001 0.24 0.38 4 × 5 1001 0 ±0.15 5 × 5 1001 0.±1 0.16 6 × 5 1001 −0.11± 0.18 7 × 5 1001 −0.12 ± 0.21 × − ± Table 6.2: Advised 2 hammer NN in the static variant of Whack-a-mole, quality relative to a plan.

6.3.1 Whack-a-mole

The experiments included here use several thousand of these data points. The average number

of hits, compared to the optimal, is how the tables should be interpreted. So, if the actual score,

qual(solsim), in a single trial was 15, but the optimal, qual(solopt) = 16, the trial would be consid- ered to have -1 relative hits.

For the single hammer case, these values are strictly non-positive, since the branch and bound is returning the exact optimal. However, the GA used for multi-hammer will very occasionally give a non-optimal plan, which under the right circumstances, can be improved by dynamic action of the NN hammers.

Table 6.1 shows very good and very boring results. Without other hammers to cause “vehicle deadlock”, the hammer never deviates from the optimized plan, achieving the optimal in every case. Here, X Y, indicates an average X hits more than the optimal, with standard deviation ± Y.

Table 6.2, is somewhat more interesting since it shows that the NN hammers are capable of devi- ating from the plan. There are 2 main reasons for this:

85 1. Vehicle deadlock, or traffic jams (borrowing terms from PDPTW)

2. GA may fall short of optimal

The first reason, vehicle deadlock, will typically only cost one of the agents a single time step, but

this can be enough to miss a time window. In fact, it sets the entire schedule behind by a single

time step, so if one of the future assignments in the plan to cut too close to the deadline, then the

agent will miss. This will give a negative, non-zero quality for that trial.

If the GA returns a less than optimal plan, there may be an opportunity for the dynamic system to

apply any available free time to hit an extra mole. This could potentially give a positive non-zero

score, but Table 6.2 shows this happens far less often than deadlock (when problem sizes increase).

For small maps (see 3 5), this is happening since the NN hammers can reliably hit every mole, × every time, in worlds that size. The GA optimizer (as it was tuned), fell short more often.

6.3.2 Pickup and Delivery

PDP, being multi-vehicle (specifically 2 vehicles in this experiment), compares closer to the im-

perfect, multi-hammer case. Strict vehicle deadlock is enforced for PDP in the SENSES simulator,

meaning that no vehicles can be simultaneously on the same vertex or edge. This does occasion-

ally interfere with plans. It should be pointed out that the results are better here than in the multi-

hammer case, as a result of the repelling vehicle infochemicals, and also the relative sparseness of

the world.

The efficiency (qual) of each was measured and normalized for comparison as percent worse than optimal again, with the ( ) range representing 1 standard deviation. Specifically, this is the qual- ± value of the solution that was used by EEIA to generate the rules, not necessarily the exact optimal.

What is really being tested is the ability to follow a plan.

It should also be noted that the rare cases (seen in multi-hammer Whack-a-mole) where the agents

exceed the quality of their plan do not occur here. In Whack-a-mole, the optimal was often less

86 m Trials Efficiency m Trials Efficiency 4 50 0.24 1.16 19 50 1.38 1.76 5 50 0.10 ± 0.47 20 50 1.16 ± 1.55 6 50 0.29 ± 0.92 21 50 0.74 ± 1.09 7 50 0.10 ± 0.38 22 50 0.85 ± 1.14 8 50 0.39 ± 1.05 23 50 1.02 ± 1.42 9 50 0.48 ± 0.95 24 50 0.84 ± 1.17 10 50 0.29 ± 0.70 25 50 1.05 ± 1.41 11 50 0.52 ± 1.41 26 50 1.47 ± 1.47 12 50 0.55 ± 1.07 27 50 0.99 ± 1.17 13 50 0.87 ± 1.50 28 50 1.33 ± 1.68 14 50 0.79 ± 1.26 29 50 1.35 ± 1.65 15 50 0.94 ± 1.61 30 50 1.22 ± 1.69 16 50 0.66 ± 1.57 31 50 1.02 ± 1.48 17 50 0.85 ± 1.24 32 50 1.44 ± 1.65 18 50 0.67 ± 0.98 33 50 0.69 ± 0.92 ± ± Table 6.3: Average efficiency of executing plans in the static variant of PDPTW. than 100% hits, so those tasks seen as unreachable were left dynamic, leaving opportunity for the agents to improve.

The PDP agents have no such opportunity. PDP tasks, due to the large time windows, are always included in the plan and assigned. They are fulfilled by an agent with a proactive rule, and not fulfilled by the agents with ignore rules. Actual performance will either match the quality of the plan or fall behind schedule due to “traffic jams” (a.k.a. vehicle deadlock). Plans will certainly be delayed by this part of the time, which is the source of the not-quite-optimal results. Even though the plan is only and estimate of the optimal (from the GA), it still represents fulfillment of all tasks, just in a potentially non-optimal permutation.

As Table 6.3 shows, the system using the EEIA is producing results within 1 percentage point standard deviation of 0 (the “optimal”) for almost every level of m.

87 6.4 Full Runs

With the more basic qualities of the advisor displayed, the last experimental series is intended to

show the effect of the advisor in a more realistic environment, which requires all of its abilities

working together.

This will mean working on full runs, rather than individual, disconnected run instances. These will

present an opportunity for the EEIA to apply its learning abilities, but also introduce potentially

misleading knowledge.

These experiments require the following parameters:

Parameter Name Description Recommended Value

N M The size of the world variable or 11 11 × × m Number of tasks variable

k size of known subset 5

δ Controls ratio of recurring/dynamic tasks 0.75/0.85 Unlike the previous experiments where single run instances were generated, this experiment re-

quires the creation of an entire run.

First, a base run instance bri = ((ta1,t1),...,(tam,tm)) of random events is created. Then each event is assigned a probability from the range [δ,1.0], to be used in a weighted coin toss. Specifi- m−q q cally (taq,tq) is given weight, wq = δ + , so that the range [δ,1.0] is covered uniformly. · m m A run instance ultimately is created as  0 0 0 0 rii = (ta1,t1),...,(tam,tm),  known known known known (ta1 ,t1 ),...,(ta5 ,t5 )

where:   ta t : coin toss passes 0 0  ( j, j) (ta j,t j) =  new random (ta,t) : coin toss fails

88 coin weight: 0.75 tails ta 0 ∗ random coin weight: 0.78 tails ta 1 ∗ random coin weight: 0.81 heads ta2 ta2 coin weight: 0.84 tails ta 3 ∗ random coin weight: 0.88 tails ta bri 4 ∗ random coin weight: 0.91 heads ta5 ta5 coin weight: 0.94 ri heads j ta6 ta6 coin weight: 0.97 heads ta7 ta7

tak 3

tak 2

tak 1

tak 0

tak tak tak tak 3 2 1 0 known

Figure 6.6: The coin-flipping procedure for generating run instances with partially recurring tasks.

The set (taknown,tknown),...,(taknown,tknown) are randomly created events that represent the sec- { 1 1 5 5 } ond knowledge source, and will be used to show the relative difference in performance when known-ahead tasks are included in the optimize step. The system using only the EIA naturally will treat these events as dynamically announced ones.

The 5 known tasks obviously must be created in a way that preserves the uniqueness. For the

Whack-a-mole case, these 5 tasks take the place of the right-most 5 tasks in the lowest row in the world, in order to achieve uniqueness by the same “one mole per location” convention (resulting in a set of m tasks). In Pickup and Delivery, the tasks are simply generated uniquely, then added to the run instance as normal (resulting in a set of m + k tasks).

Each experiment consists of 55 run instances created this way:

run j = (ri1,ri2,...,ri55)

89 δ .75 .85 Recurring tasks 84% of m 93% of m Accurate 78% of m 88% of m Misleading 6.5% of m 5.8% of m

Table 6.4: The effect of δ on the knowledge set.

The length, 55, is mostly arbitrary. It is (more than) large enough that a stable average performance can be captured, but limited enough experiments can complete in reasonable time.

This procedure (illustrated in Figure 6.6) creates a run with run instances containing 3 types of tasks:

1. Recurring tasks (ones with a high tendency to get “heads”)

2. Non-recurring tasks (ones with a high tendency to get “tails”)

3. Known-ahead tasks (those in the subset known)

Additionally, those tasks with a high likelihood to get “heads” will be detected as recurring by the extract step but will still occasionally get a “tails” and be excluded from the run instance (replaced by a random mole). This creates a chance for this knowledge to be misleading, and demonstrates the EEIA’s ability to handle such situations. The failed coin tasks will be random tasks, and handled dynamically.

Generating run instances in this way creates enough recurring data for the gEIA to use, while also A introduces uncertainty that did not exist in the previous experimental series. No matter which wq is used for a task, it is not 100%. For pastwind = 5, and minocc = 0.7, as we have configured the extract step (see Section 2.3.1), any task that has occurred in at least 4 out of 5 of the most recent run instances will be considered recurring knowledge. Because the weight is not 100%, there is a small possibility that this task will not occur in the next run instance. It is still certainly included in the planning, which means the plan will lead the vehicles astray and hurt efficiency. This is what we call misleading knowledge, when the task is planned but not does occur. We would call

90 it accurate knowledge, if it was planned and did occur. Misleading knowledge is a real world risk

that always exists when learning from a noisy data set.

The parameter δ will be used to control the amount of this misleading knowledge that is introduced

and will also effect the general easiness of the run. The statistical effect of δ is described in Table

6.4.

The first few run instances (rii for i < pastwind) in each experiment are allowed to run without

advice, only collecting history data to be used by gEIA and gEEIA. Here pastwind = 5 in all A A experiments (see Section 2.3.1). Each experiment is performed with 3 system configurations:

base - The solution with no advice of any kind • EIA - The solution with rules based on recurring knowledge, where the events in •

(taknown,tknown),...,(taknown,tknown) { 1 1 5 5 }

are excluded from the merge and optimize steps, and are handled dynamically.

Otherwise, groups of exception rules as described in Section 2.3.3 are created for

the recurring events.

EEIA - The solution with rules based on both known and recurring events, where • the events in (taknown,tknown),... (taknown,tknown) are included in the merge and { 1 1 5 5 } optimize steps.

For each run instance in each experiment, qual is calculated. This will specifically be the total time required for each run instance. The measure of an experiment is then the sum of qual for each run instance it contains (excluding rii with i < pastwind).

6.4.1 Whack-a-mole

Table 6.5 shows the single hammer FIFO results.

91 Each trial is represented in 2 formats:

1. The accumulated hits (base, EIA, EEIA), where higher is better.

2. Average hits per run instance less than optimal, 1 standard deviation, (base(rel), ± EIA(rel), EEIA(rel)), where closer to zero is better.

Here, the choice of δ shows its clear effect. The δ = 0.75 problems are far more difficult to give advice for, as there will be a much higher ratio of misleading knowledge. Still, this does not present a barrier in all cases.

As can be seen in several rows (#0s 1,2,25,26,27,28,29) in Table 6.5, it is common for the unadvised

FIFO strategy to achieve the optimal hits if the run instance is small enough (m = 8). In this

case, adding rules can only do harm. For large run instances, with m = 20, the results are much

better.

Apart from the individual trials, the more important observation is a clear correlation between the

run instance size, and the size of the improvement. For smaller run instances, providing advice

tends to do more harm than good. Typically in these cases the dynamic solution is already per-

forming very well, so optimization is unnecessary. However, as the run instances increase, the

advantage of the EEIA also increases. This is very promising, as real life run instances tend to be

large if they require optimization which cannot be done by hand.

This increase in performance over the base system is present for both values of δ. In a real life

application there will likely be a reliable ratio of accurate to misleading knowledge (over the long

term). A δ could be chosen to estimate this ratio, and if the pattern holds, there should be a

threshold for m where results become positive.

We see mostly similar results in the multi-hammer NN case (Table 6.6). There are a few rows,

(row #0s 7,12,24,48), in this table which the NN hammers did not respond well to advice, or were overwhelmed by the misleading knowledge. Still, even with these unfortunate trials, the

92 average case is still good. This can be seen by taking average improvement (with EEIA’s advice

compared to no advice) over multiple runs. For δ = 0.75,m = 35, we see the EEIA’s advice providing an average of 0.95 more hits per run instance (compared to the unadvised strategy).

For δ = 0.85,m = 35, the same comparison shows 2.49 hits per run instance. The EIA shows a significant improvement over the base system, and the EEIA shows further improvement on the

EIA’s performance.

6.4.2 Pickup and Delivery

The results in Table 6.7 closely mirror those in Table 6.5, being better than those in Table 6.6.

Each trial is represented in 2 formats:

1. The accumulated time (base, EIA, EEIA), where lower is better.

2. Average percent worse than optimal, 1 standard deviation, (base%, EIA%, EEIA%), ± where closer to zero is better.

The relative improvement over the unadvised system again is small for small run instances, but scales in a desirable way. For run instances of m = 30, we are seeing the EIA as a strong improve- ment over the unadvised base system, and the EEIA, a further improvement on that.

It is clear to see that the earlier observation in the Whack-a-mole results also applies here. The size of the improvement over the base system correlates with the size of the run instance, which correlates with the size of the recurring knowledge set.

6.4.3 More Vehicles

Varying sizes of run instances has indicated good scaling ability for systems under advice. How- ever, this is only half of the issue. To scale well, an advisor must also give good advice to larger groups of agents. This section shows the results of the full run experiments when we vary the

93 # N M m δ base base(rel) EIAEIA(rel) EEIAEEIA(rel) × 0 2 4 8 0.75 160 0 0 152 0.4 0.44 157 0.15 0.13 1 2 × 4 8 0.75 160 0 ± 0 152 −0.4 ± 0.34 155 −0.25 ± 0.19 2 2 × 4 8 0.75 159 0.05± 0.05 158 −0.1 ± 0.19 157 −0.15 ± 0.13 3 2 × 4 8 0.75 157 −0.15 ± 0.23 152 −0.4 ± 0.54 151 −0.45 ± 0.25 4 2 × 4 8 0.75 159 −0.05 ± 0.05 159 −0.05± 0.05 157 −0.15 ± 0.13 5 3 × 4 12 0.75 235 −0.25 ± 0.29 184 − 2.8 ±0.96 232 − 0.4 ±0.44 6 3 × 4 12 0.75 228 − 0.6 ±1.14 204 −1.8 ± 1.16 228 −0.6 ± 0.44 7 3 × 4 12 0.75 222 −0.85± 1.03 192 −2.35± 1.73 219 − 1 ±0.5 8 3 × 4 12 0.75 234 − 0.3 ±0.51 189 −2.55 ± 1.05 226 −0.7± 0.61 9 3 × 4 12 0.75 228 −0.6 ± 0.34 205 −1.75 ± 0.79 230 −0.5 ± 0.25 10 4 × 4 16 0.75 247 −2.75± 0.79 211 −4.55 ± 0.55 243 −2.95± 0.85 11 4 × 4 16 0.75 255 −2.45 ± 0.65 229 −3.75 ± 1.39 253 −2.55 ± 1.45 12 4 × 4 16 0.75 258 −2.55 ± 1.55 234 −3.75 ± 0.89 265 − 2.2 ±1.06 13 4 × 4 16 0.75 235 − 3.6 ±0.84 215 − 4.6 ±1.34 253 −2.7 ± 2.11 14 4 × 4 16 0.75 269 −2.1 ± 1.19 233 −3.9 ± 2.99 261 −2.5 ± 2.75 15 5 × 4 20 0.75 257 −3.45± 1.35 235 −4.55± 1.15 254 −3.6 ± 2.44 16 5 × 4 20 0.75 243 − 4.1 ±1.79 238 −4.35 ± 1.63 261 −3.2 ± 2.06 17 5 × 4 20 0.75 224 −4.5 ± 1.75 227 −4.35 ± 1.63 247 −3.35± 2.63 18 5 × 4 20 0.75 235 −5.55± 1.65 248 − 4.9 ±3.69 277 −3.45 ± 3.85 19 5 × 4 20 0.75 253 −4.25 ± 2.49 234 −5.2 ± 3.06 258 − 4 ±1.9 20 6 × 4 24 0.75 235 − 5.6 ±1.54 262 −4.25± 1.89 281 −3.3± 3.11 21 6 × 4 24 0.75 243 −4.5 ± 1.35 249 − 4.2 ±2.06 256 −3.85± 2.43 22 6 × 4 24 0.75 248 −4.85± 2.63 263 −4.1 ± 1.69 282 −3.15 ± 1.83 23 6 × 4 24 0.75 250 −5.35 ± 2.83 294 −3.15± 0.93 304 −2.65 ± 1.73 24 6 × 4 24 0.75 230 −6.35 ± 1.83 260 −4.85 ± 3.03 275 − 4.1 ±3.39 × − ± − ± − ± 25 2 4 8 0.85 160 0 0 157 0.15 0.23 159 0.05 0.05 26 2 × 4 8 0.85 160 0 ± 0 150 − 0.5 ±0.35 158 − 0.1 ±0.09 27 2 × 4 8 0.85 160 0 ± 0 143 −0.85± 0.83 160− 0 ±0 28 2 × 4 8 0.85 160 0 ± 0 153 −0.35 ± 0.23 159 0.05± 0.05 29 2 × 4 8 0.85 160 0 ± 0 157 −0.15 ± 0.13 157 −0.15 ± 0.13 30 3 × 4 12 0.85 226 0.7± 0.51 196 − 2.2 ±1.06 229 −0.55 ± 0.95 31 3 × 4 12 0.85 233 −0.35± 0.53 200 − 2 ±1.5 229 −0.55 ± 0.75 32 3 × 4 12 0.85 217 − 1.1 ±1.59 176 3−.15± 0.83 229 − 0.5 ±0.35 33 3 × 4 12 0.85 231 −0.45± 0.55 208 − 1.6 ±0.94 235 −0.25± 0.29 34 3 × 4 12 0.85 235 −0.25 ± 0.19 184 −2.8 ± 0.96 236 − 0.2 ±0.16 35 4 × 4 16 0.85 222 −3.25 ± 0.49 209 −3.9 ± 1.69 242 −2.25± 1.99 36 4 × 4 16 0.85 266 −2.25 ± 0.59 237 −3.7 ± 0.81 274 −1.85 ± 1.03 37 4 × 4 16 0.85 258 −2.35 ± 0.63 244 −3.05± 1.05 283 − 1.1 ±0.79 38 4 × 4 16 0.85 257 −2.45 ± 1.05 228 − 3.9 ±1.69 265 −2.05± 2.75 39 4 × 4 16 0.85 233 −2.75 ± 0.59 209 −3.95± 0.95 248 − 2 ± 1 40 5 × 4 20 0.85 225 −5.05 ± 1.75 283 −2.15 ± 1.13 295 1−.55 ± 1.65 41 5 × 4 20 0.85 231 −4.75 ± 1.49 266 − 3 ±1.7 286 − 2 ±1.2 42 5 × 4 20 0.85 245 − 4.3 ±0.91 275 −2.8± 0.96 292 1−.95± 1.55 43 5 × 4 20 0.85 243 −4.1 ± 1.69 255 −3.5 ± 1.15 279 − 2.3 ±1.31 44 5 × 4 20 0.85 246 −3.3 ± 1.41 246 −3.3 ± 1.51 272 − 2 ±2.2 × − ± − ± − ± Table 6.5: Comparison of the quality (Avg. Hits better than base system) of solutions achieved by base system (single hammer FIFO), system with EIA and system with EEIA for larger values of m.

94 # N M m δ base base(rel) EIAEIA(rel) EEIAEEIA(rel) × 0 3 5 15 0.75 260 2 1.9 228 3.52 3.11 265 1.76 2.18 1 3 × 5 15 0.75 237 3−.24± 1.71 259 −2.19 ± 1.58 268 −1.76 ± 1.42 2 3 × 5 15 0.75 264 − 2.1 ±1.13 248 −2.86 ± 1.46 269 −1.86 ± 1.65 3 3 × 5 15 0.75 251 −2.57± 2.34 254 −2.43 ± 1.77 268 −1.76 ± 1.71 4 3 × 5 15 0.75 259 − 2.43± 1.2 239 −3.38 ± 2.81 248 − 2.95± 1 5 4 × 5 20 0.75 296 −4.19 ±3.39 283 −4.81 ± 3.68 316 −3.24 ±2.09 6 4 × 5 20 0.75 293 −5.29 ± 2.01 301 − 4.9 ±2.37 309 −4.52 ± 2.54 7 4 × 5 20 0.75 313 −4.76 ± 1.99 307 −5.05± 2.33 310 − 4.9 ±2.66 8 4 × 5 20 0.75 286 −5.86 ± 4.98 294 −5.48 ± 2.63 304 − 5 ±3.52 9 4 × 5 20 0.75 303 − 4.1 ±2.85 298 −4.33 ± 2.32 313 −3.62± 2.52 10 5 × 5 25 0.75 283 −8.86± 2.41 313 −7.43 ± 1.96 353 −5.52 ± 2.34 11 5 × 5 25 0.75 328 −7.43 ± 1.96 317 −7.95 ± 3.09 377 − 5.1 ±3.61 12 5 × 5 25 0.75 363 −6.24 ± 2.18 333 −7.67 ± 2.32 353 −6.71± 3.63 13 5 × 5 25 0.75 303 −6.76 ± 2.28 293 −7.24 ± 2.37 316 −6.14 ± 3.17 14 5 × 5 25 0.75 368 −6.29 ± 1.35 345 −7.38 ± 1.28 368 −6.29 ± 2.11 15 6 × 5 30 0.75 391 −7.76 ± 2.75 376 −8.48 ± 2.34 412 −6.76 ± 4.47 16 6 × 5 30 0.75 349 −8.81 ± 4.06 324 − 10 ±2.95 359 −8.33 ± 2.51 17 6 × 5 30 0.75 363 − 9 ±3.9 379 −8.24± 2.28 409 −6.81 ± 2.15 18 6 × 5 30 0.75 345 9−.67± 4.41 357 − 9.1 ±3.23 387 −7.67 ± 4.51 19 6 × 5 30 0.75 322 −9.19 ± 3.68 318 −9.38± 4.14 358 −7.48 ± 3.96 20 7 × 5 35 0.75 347 −10.9 ± 2.75 368 − 9.9 ± 3.9 393 − 8.71± 4.3 21 7 × 5 35 0.75 425 − 8 ±2.76 380 −10.14± 4.69 429 −7.81 ± 5.2 22 7 × 5 35 0.75 388 −9.71± 3.44 388 − 9.71 ±5.35 405 −8.9 ±3.99 23 7 × 5 35 0.75 378 −10.1 ± 3.32 412 −8.48 ± 6.82 417 −8.24± 3.32 24 7 × 5 35 0.75 401 −9.81 ± 4.34 372 −11.19± 5.77 395 −10.1 ± 7.04 × − ± − ± − ± 25 3 5 15 0.85 251 2.38 2.43 236 3.1 3.04 277 1.14 1.46 26 3 × 5 15 0.85 249 −2.29 ± 3.35 265 −1.52± 2.54 255 − 2 ±2.38 27 3 × 5 15 0.85 251 −2.29 ± 2.11 248 −2.43 ± 1.67 265 −1.62± 1.85 28 3 × 5 15 0.85 264 −2.19 ± 1.96 256 −2.57 ± 1.29 264 −2.19 ± 1.39 29 3 × 5 15 0.85 239 −3.24 ± 1.42 270 −1.76 ± 1.51 260 −2.24 ± 1.42 30 4 × 5 20 0.85 266 −5.24 ± 1.99 296 − 3.81± 3.3 314 −2.95 ± 2.14 31 4 × 5 20 0.85 298 −4.67 ± 1.46 302 −4.48 ±0.92 333 − 3 ±1.14 32 4 × 5 20 0.85 320 − 4.19± 3.2 309 −4.71 ± 1.82 331 −3.67± 2.03 33 4 × 5 20 0.85 299 −4.86 ±2.03 300 −4.81 ± 2.06 323 −3.71 ± 4.11 34 4 × 5 20 0.85 325 − 4.05± 1 306 −4.95 ± 1.95 329 −3.86 ± 1.46 35 5 × 5 25 0.85 349 −7.48 ±4.15 341 −7.86 ± 3.07 386 −5.71 ± 3.06 36 5 × 5 25 0.85 352 − 6.9 ±1.71 361 −6.48 ± 1.58 395 −4.86 ± 1.65 37 5 × 5 25 0.85 361 −6.71± 3.35 338 −7.81 ± 3.68 386 −5.52 ± 2.92 38 5 × 5 25 0.85 364 −6.24 ± 1.42 347 −7.05 ± 4.14 377 −5.62 ± 4.24 39 5 × 5 25 0.85 336 −6.81 ± 3.39 338 −6.71 ± 2.59 371 −5.14 ± 2.03 40 6 × 5 30 0.85 410 − 8 ±3.62 380 −9.43 ± 4.15 438 −6.67 ± 3.46 41 6 × 5 30 0.85 355 −8.71± 3.54 366 −8.19 ± 4.82 396 −6.76 ± 3.42 42 6 × 5 30 0.85 345 − 9.67± 2.7 380 − 8 ±3.71 414 −6.38 ± 3.47 43 6 × 5 30 0.85 328 −10.1 ±2.85 373 −7.95± 2.24 401 −6.62 ± 3.09 44 6 × 5 30 0.85 317 −10.86± 3.74 368 −8.43 ± 2.34 405 −6.67 ± 4.03 45 7 × 5 35 0.85 396 − 9.14 ±3.36 423 −7.86 ± 4.88 442 −6.95 ± 4.52 46 7 × 5 35 0.85 355 −11.05± 3.47 407 −8.57 ± 3.48 435 −7.24 ± 4.47 47 7 × 5 35 0.85 342 −11.14 ± 1.65 385 − 9.1 ±4.85 427 − 7.1 ± 7.8 48 7 × 5 35 0.85 382 − 8.57 ±7.67 377 −8.81± 5.96 373 − 9 ±6.19 49 7 × 5 35 0.85 329 −9.71 ± 2.78 353 −8.57 ± 3.58 383 −7.14± 5.17 × − ± − ± − ± Table 6.6: Comparison of the quality (Avg. Hits better than base system) of solutions achieved by base system (multi-hammer NN), system with EIA and system with EEIA for larger values of m.

95 Run # m δ base base% EIAEIA% EEIAEEIA% 1 10 .85 7774 20.34 5.58 7676 18.87 5.14 6927 7.29 5.18 2 10 .85 7330 14.10 ± 5.47 7617 18.45 ±11.82 6926 7.75 ± 6.27 3 10 .85 7182 16.97 ± 6.04 7404 20.64± 5.96 6625 7.78 ± 5.08 4 10 .85 7554 15.72 ± 4.20 7722 18.51 ± 8.06 7043 7.97 ± 6.40 5 10 .85 7064 13.90 ± 6.29 7570 22.16 ± 6.40 6775 9.28 ± 6.97 ± ± ± 6 10 .75 7832 13.79 4.06 8243 19.88 7.78 7647 11.18 6.48 7 10 .75 7220 15.03 ± 5.75 7853 25.12 ± 7.63 7184 14.35 ± 6.87 8 10 .75 8067 13.52 ± 5.33 8529 20.11 ± 6.90 7901 11.21 ± 5.84 9 10 .75 7361 17.21 ± 6.11 7770 23.78 ± 8.74 7196 14.63 ± 5.77 10 10 .75 7885 17.49 ± 7.72 8247 22.83 ±10.64 7805 16.31 ± 7.38 11 20 .75 10786 23.12 ± 5.64 10870 24.07± 5.78 10162 16.01 ± 5.77 12 20 .75 10618 24.75 ± 7.02 10712 25.92 ±13.83 9902 16.25 ± 6.29 13 20 .75 12134 21.83 ± 4.95 12042 20.93± 6.39 11514 15.64 ± 6.71 14 20 .75 10423 26.12 ± 6.19 10402 25.82 ± 7.16 9621 16.35 ± 5.85 15 20 .75 11233 25.86 ± 6.07 11132 24.69 ± 6.83 10515 17.77 ± 7.60 16 25 .75 12232 23.54 ± 4.59 12158 22.80 ± 6.40 11679 18.00 ± 6.36 17 25 .75 13228 23.42 ± 5.83 12995 21.25 ± 5.50 12345 15.19 ± 6.01 18 25 .75 12805 24.33 ± 4.14 12375 20.16 ± 6.50 11926 15.80 ± 6.00 19 25 .75 12413 25.71 ± 4.77 12199 23.54 ± 7.06 11567 17.15 ± 7.47 20 25 .75 13060 22.76 ± 5.20 12782 20.18 ± 6.07 12254 15.21 ± 6.04 21 30 .75 13945 23.51 ± 4.97 13510 19.73 ± 6.90 12922 14.48 ± 5.75 22 30 .75 14451 23.72 ± 5.78 14275 22.22 ± 6.82 13649 16.87 ± 6.09 23 30 .75 14833 26.81 ± 4.75 13946 19.21 ± 5.47 13516 15.53 ± 5.24 24 30 .75 14006 20.81 ± 4.73 13695 18.13 ±15.01 13491 16.33 ± 5.24 25 30 .75 14103 24.40 ± 5.12 13611 19.99 ± 18.61 13586 19.84 ± 5.91 26 35 .75 14613 25.05 ± 4.45 14334 22.68± 5.03 13844 18.49 ± 6.21 27 35 .75 14441 25.92 ± 3.78 13751 19.85 ± 9.34 13463 17.32 ± 5.96 28 35 .75 15695 22.23 ± 3.68 15221 18.58 ± 5.95 14855 15.70 ± 5.09 29 35 .75 14973 21.80 ± 4.37 14570 18.52 ± 7.05 14286 16.19 ± 6.69 30 35 .75 15123 23.31 ± 5.92 14610 19.10 ± 5.15 13973 13.88 ± 5.34 ± ± ± Table 6.7: Comparison of the quality (total time over all run instances of an experiment) for Pickup and Delivery with 2 vehicles and measuring efficiency by time, varying values of m.

96 number of agents working.

The previous sections have established how results suffer for smaller run instances, so there is no further need to dwell on this. In Table 6.8, a size m = 30 is constant for all run instances. At this size the results are positive as expected.

We do, however, need to select a different optimization criterion. If adding additional vehicles while measuring by time, the larger fleets will out-perform the smaller fleets as expected. Up to a point, every vehicle added means the run instance will be completed sooner. The natural limit obviously occurs when vehicles out-number tasks, but the actual performance ceiling will probably come with fewer vehicles than this. For 30 tasks, 2 vehicles is already very close. The Hit/Miss

Ratio, as a measure, has this same problem.

Having the dynamic base system push close to the optimal performance eliminates the need for advice, and so would display poor results for an advisor.

The distance measure will partially mitigate this problem. The combined distance of all vehicles

(so-called Total Travel Cost), will account for inefficiencies that are ignored by the completion time measure. Superfluous vehicles (leaving the depot, then returning without doing useful work), will not register any cost by the time measure, but certainly contribute to distance inefficiency.

Additionally, if a vehicle is unaware of future tasks, it may mistakenly return to the depot, only to be redeployed later, covering even more unnecessary distance. Adding this inefficiency, while also using a measure that does not necessarily favor larger fleets, the wedge separating the base system’s performance from the optimal remains large enough for advice to be valuable.

Table 6.8 shows 10 distinct runs. Each run has been evaluated for the 3 system configurations

(base, EIA, EEIA), and varying the number of agents (2,3, and 4). Each of the resulting trials are displayed as before:

1. The accumulated distance (base, EIA, EEIA), where lower is better.

97 2. Average percent worse than optimal, 1 standard deviation, (base%, EIA%, EEIA%), ± where closer to zero is better.

Seeing the number of vehicles increase from 2, to 3, to 4, is visible causing an increase in the distance covered (on the same run instance). This distance increase is an inefficiency increase, but it is primarily harming the performance of the unadvised system. It does not appear to have any consequence with regard to the EIA’s or the EEIA’s ability to improve efficiency. The EIA is consistently improving on the base system’s performance, and the EEIA is consistently improving on that. If there is a relationship between the number of vehicles and performance, it cannot be as significant as the effect of m or δ.

The expectation (extrapolating from these results) is that, assuming an appropriate measure of efficiency is chosen, the number of vehicles should be a neutral factor. Of course, using either of the 2 other measures (Hit/Miss Ratio, Completion Time), we would not see these results. Both of the other measures fail to distinguish the wedge between emergent and optimal as more vehicles are added, leaving no room for the EIA or EEIA to make improvement.

98 run# Vehicles δ m base base% EIAEIA% EEIAEEIA% 75.1 2 0.75 30 14001 19.53 0.22 13910 18.78 0.23 13424 14.66 0.27 75.2 2 0.75 30 14294 19.2 ±0.24 14019 16.91 ± 0.23 13349 11.33 ± 2.72 75.3 2 0.75 30 13717 18.07± 0.23 13628 17.36 ± 0.32 13126 12.95 ± 1.27 75.4 2 0.75 30 15211 17.26 ± 0.16 14990 15.6 ±0.25 14627 12.84 ± 0.21 75.5 2 0.75 30 13622 15.43 ± 0.21 13525 14.5 ± 1.48 13495 14.33 ± 0.31 75.1 3 0.75 30 14295 21.16 ± 0.19 13956 18.34± 0.26 13617 15.46 ± 0.32 75.2 3 0.75 30 14617 21.38 ± 0.15 14114 17.19 ± 0.25 13705 13.81 ± 0.37 75.3 3 0.75 30 14010 19.53 ± 0.17 13544 15.4 ±2.27 13403 14.39 ± 0.27 75.4 3 0.75 30 15422 18.26 ± 0.16 15157 16.28± 0.3 14509 11.33 ± 1.26 75.5 3 0.75 30 13842 16.51 ± 0.15 13978 17.67 ± 0.2 13540 13.98± 0.3 75.1 4 0.75 30 14473 21.61 ± 0.22 14178 19.12 ±0.28 13526 13.67 ± 0.6 75.2 4 0.75 30 14879 22.4 ±0.18 14449 18.87 ± 0.37 13890 14.27 ±0.34 75.3 4 0.75 30 14413 22.25± 0.19 13891 17.89± 0.3 13410 13.78 ± 0.38 75.4 4 0.75 30 15468 18.05 ± 0.17 15240 16.27 ±0.16 14609 11.51 ± 2.38 75.5 4 0.75 30 14208 18.56 ± 0.22 13981 16.69 ± 0.19 13750 14.68 ± 0.22 ± ± ± 85.1 2 0.85 30 14542 18.27 0.14 13959 13.62 0.29 13302 8.27 0.27 85.2 2 0.85 30 16435 16.64 ± 0.11 15660 11.17 ± 0.12 14583 3.67 ± 2.58 85.3 2 0.85 30 15208 16.93± 0.1 14822 14 ±0.15 14215 9.35 ± 0.18 85.4 2 0.85 30 13569 25.54 ±0.24 12228 13.15± 0.2 11800 9.12 ± 0.18 85.5 2 0.85 30 14158 16.75 ± 0.12 13774 13.6 ±0.15 13258 9.34± 0.1 85.1 3 0.85 30 14713 19.13 ± 0.17 14100 14.15± 0.27 13234 7.19 ±0.71 85.2 3 0.85 30 16525 16.99 ± 0.19 14943 5.66 ±5.59 14547 3.05 ± 3.68 85.3 3 0.85 30 15299 16.65 ± 0.08 14868 13.39± 0.18 14311 9.17± 0.2 85.4 3 0.85 30 13912 27.9 ±0.27 12315 13.09 ± 0.19 11862 8.99 ±0.21 85.5 3 0.85 30 14348 17.52± 0.1 13304 8.99 ± 4.7 12950 6.01 ± 3.31 85.1 4 0.85 30 14977 20.24 ±0.16 14142 13.57± 0.2 13427 7.87 ± 0.22 85.2 4 0.85 30 16685 17.12 ± 0.08 15651 9.96 ±1.73 14931 4.94 ± 2.13 85.3 4 0.85 30 15723 19.22 ± 0.14 15051 14.15± 0.21 14308 8.6 ±0.56 85.4 4 0.85 30 14152 28.74 ± 0.35 12314 11.97 ± 1.22 11984 8.97± 0.2 85.5 4 0.85 30 14667 19.54 ± 0.17 14074 14.71 ± 0.17 13545 10.4 ±0.18 ± ± ± Table 6.8: Comparison of the quality (total time over all run instances of an experiment) for Pickup and Delivery with 2 vehicles and measuring efficiency by distance, varying number of vehicles.

99 Chapter 7

Related Works

Through the methods described in this thesis, trade-offs have been faced. Some of these were functional, others were performance based, and others were simply for design reasons. Other authors have faced the same trade-offs and gone different directions.

Since the EIA and EEIA are defined abstractly, in the management and control area, there will be competitors here. However, the application domains (Whack-a-mole and Pickup and Delivery) also have received a great amount of attention. Pickup and Delivery is particularly diverse in approaches. We give a brief overview of each here.

7.1 Control Theory

Acknowledging that autonomous agents, with local perspectives, tend to have sub-optimal per- formance opens the door to the possibility of improving that performance, usually by adding an additional entity to adjust and manipulate the agents behaviors. The EIA and EEIA are examples of these entities, but they are not the only ones, nor the first. While we say what the advisor does is advise, others choose similar words: control, correct, consult, manage. All of these refer to

basically the same idea, and do not really reveal the trade-offs individual authors have faced.

In works such as [IBM06], [Sha94], [BM+06], [SLT08] we see feedback loop diagrams very simi-

lar to Figures 2.3 and 3.1, which indicate that (at least on a high level) they are similar ideas. They

follow the same basic steps (as do we):

1. Gather Knowledge

2. Process Knowledge

100 3. Act on Knowledge

Differences come in the details. Primarily, most authors do not generally make the commitment to the controller remaining a non-essential, non-omniscient component, as is one of the main strengths of the EIA and EEIA.

Focus deserves to be on [TP+11] (which is a generalization of [BM+06]), since it seems to be the closest to the EEIA in design. These works describe an Observer/Controller loop, influencing agents in a Organic Computing system, the same way the EEIA does, by using exception rules.

This has separate centralized and decentralized variations, with decentralized versions carrying a separate Observer/Controller internally. Of course, achieving decentralization in this way means that the global view is sacrificed in favor of a perpetually available view. The EIA and EEIA have gone the opposite way, assuming the communications and visibility between the agents and the advisor will be unreliable, so completing all steps related to advice prior to deployment. This saves the global perspective, while remaining unobtrusive, but sacrifices the ability to control on- the-fly.

[TP+11] is applied in 3 very different settings: Elevator Control, Off-highway Machines, and

Cleaning Robots. These 3 problems were solved with 3 significantly different approaches. Techni- cally, each system under control is capable of functioning on its own, and the Observer/Controller was applied diverse ways, that were customized for each problem. This concept did not give a specific formal definition for how observations would be made, or how control would be exerted, rather these definitions were given entirely in problem specific context.

This highlights another major trade-off faced by these controllers: Those who decide to give spe- cific definitions are almost always limited to single problems. Alternatively, those who want to be applicable to many problems (often all of Multi-Agent Systems) are limited to describing their concepts in vague terms, leaving much of the detail open until the time of application.

The EIA and EEIA have gone the middle way. A subset of dynamic problems (specifically Dy-

101 namic Task Fulfillment) was chosen. Within this relatively large restricted space, we were able to commit to nearly every detail of knowledge processing in a way that is independent of the specific problem we face. The actual mechanism for observation, as well as the exception rule generation procedure, in addition to learning and optimizing, are reusable across the entire space, and will not require any redesign when expanding to new problems.

In daring to make broad commitments to yet unknown problems, the EEIA apparently stands alone.

7.2 Whack-a-mole

Whack-a-mole could be treated simply as the Traveling Repairman Problem, with a specialized quality measure, however there is at least one publication which has addressed it directly.

[GK+06] provides a formal mathematical perspective on the game. They use a similar graph based environment as us, and the same quality measure (maximizing hits). Theorems and proofs for the complexity of the static variant of the problem are provided rather than empirical analysis. Both the NP-Hardness of the general case, but also a special case (a restriction on Run Instances, to use our terminology), where the optimal can be found in polynomial time.

Experimental differences, which make comparison to those in Chapter 6 slightly difficult, are present. Multiple moles can be visible in the same hole simultaneously, and the full time window of each mole is known (even in the dynamic case). While these variations can make an interesting mathematical problem, they do not reflect the setting of the original arcade game. For this thesis, the setup was mostly designed to reflect that setting.

Dynamic strategies are compared against a non-abusive optimal, which cannot move except when moving in the direction of a visible mole. This is an interesting restriction to make as it partially removes the ability to use foreknowledge but would not be a detriment to any dynamic solution

102 under any circumstance. Still, in the general case, no dynamic solution is able to compete with the

non-abusive optimal.

For the dynamic variants of the problem, competitive analysis is given. A fraction of the optimal

performance is calculated which their dynamic strategies are capable of achieving, even in the

worst case. This particular fraction, called the competitive ratio, is provided as a function of the

time window widths, and the number of moles per hole, and tends to be near 0.

A dynamic solution is said to be c competitive (where c is the competitive ratio) if in the − 1 worst case, the dynamic score is at least c times the optimal. In the case of deterministic dy- namic strategies acting against a non-abusive optimal, it has been determined that c = max NT + { 1,N( 3T/2 ) , where N is the limit on moles per hole, and T is the longest time window. b c } In all of the experiments in this thesis we would have 1 qual(opt) 0, which should not be a c · ∼ surprise. Especially against FIFO, it is always possible to fabricate a run instance where FIFO gets

0 hits, but the optimal gets at least 1 hit. Those familiar with the game “pig-in-the-middle” should

be able to see this, as well as the fact that it applies equally to nearest-neighbor.

In addition to this analysis, the paper gave special attention to edge cases with the following dy-

namic solutions:

Replan, At every time step, calculate the optimal move, based only on what is • known so far.

First come first kill (FCFK), same as the FIFO solution used in this thesis. • Ignore and whack the most (IWTM), navigate to the hole with the most moles • released in the previous interval.

Many of the proofs provided in [GK+06] apply to all dynamic strategies, not just these.

103 7.3 Pickup and Delivery

It should be no surprise that there is more interest in Pickup and Delivery than there is in Whack- a-mole. Pickup and Delivery has entire industries built upon it, and investing research hours in improving performance does result in real financial gains. This field is widely varied and many authors describe environments so specialized that they defy comparison. [BC+07] (for static vari- ants) and [BCL10] (for dynamic variants) provide an extensive catalog of these exotic problems, like swapping problems, single commodity problems, Hamiltonian walks and more.

The remainder of this section describes several approaches that more closely resemble the problem and approach described in Chapter 5.

Since the work of multiple authors is being considered, there will be ambiguity in the language of the field. It must be resolved before continuing:

The single-vehicle solutions will be treated as a special case of multi-vehicle solu- • tions. For the sake of generality, only multi-vehicle solutions will be discussed.

The base Pickup and Delivery or Vehicle Routing problems will be considered spe- • cial cases of their variants with Time Windows (which are achieved by making the

time windows arbitrarily large). Again, for the sake of generality, the focus here

will be placed on solutions with Time Windows.

In [BC+07], a distinction is made between one-to-one and many-to-many systems. • In one-to-one, a request is like a sealed envelope, with a sender and recipient labeled

on it. In many-to-many, a request is more like a commodity, where there are many

suppliers, with many customers, and it does not matter which is paired with which.

In [PDH08b], the same distinction is made, however, one-to-one is referred to as

paired and many-to-many is referred to as unpaired.

Here only the one-to-one/paired solutions are discussed, since the many-to-many is

104 a large enough field to deserve a separate treatment.

At least two papers ([MHH07][MS+10]) contrast optimization-based versus agent- • based approaches. We will consider all approaches to be agent-based, the same

distinction will be made by the terms autonomous and non-autonomous.

We will be identifying systems by citation of the paper they were defined in, however, this does create further ambiguity to be resolved. Some papers describe multiple systems, and some sys- tems are described in multiple papers. The following conventions are established to mitigate this problem.

The publication [WBH08] describes 2 systems, DynCNET and FiTA. These sys- • tems will be referred to as [WBH08]-D and [WBH08]-F in the classification.

The publication [MHH07] describes 2 systems. First, an auction simulator which • will be the one we are referring to. The other an operational research heuristics

system, which was established in different papers ([HH+02] and [HE+02]), and

only briefly described in this one.

The publication [MS+10] describes 2 systems. One based on MILP for online • optimization will be called, [MS+10]-O. The other is an auction based system,

[MS+10]-A.

The publications: [FK93], [FMP96], and [BFV97] (and to a lesser extent, [FSS03]) • all describe the same system in various stages of development. This system will be

addressed once, and referred to as [FMP96].

The publications: [SD+10], [KB+10], and [HD+10] all describe the same system in • various stages of development. This system will be addressed once, and referred to

as [KB+10]. This citation represents the EIA (from Chapter 2) applied to the DIC

approach (from Chapter 5). The EEIA applied to DIC is similar and differs only in

105 a single dimension. See Section 7.3.1.

A classification in 10 dimensions is applied to these approaches.

How pre-determined are an agent’s actions? • How does the system handle new requests mid-execution? • What is the impact of an agent/component failing? • What happens when communication fails mid-execution? • Which agents require constant communication? • How much variation is there in vehicle size? • How can tasks be given emphasis or priority? • What is the optimization criterion? • What limitations are there on simultaneous jobs? • What ability is there to cancel a task? • Each of these dimensions has multiple levels (described in separate sub-sections). This should give a rapid, but highly detailed picture of the variation in the space of approaches which can be considered competitors to the DIC approach in Chapter 5. Specifically, these approaches are for multi-vehicle, paired, pickup and delivery with time windows.

7.3.1 How pre-determined are an agent’s actions?

Total lack of autonomy is a common quality for the centralized and static solutions. If a global

solution or exact solution is required, this is the only way.

The important thing to note about this is that it does not add any hard functionality, and is never a

requirement for any job. A plan will aid performance, make routes shorter and faster, and prevent

106 multiple resources from being committed to a single task. A purely dynamic system, which cannot

plan in the long term, can still work in all environments. The dynamic system will simply take

longer and cost more to do the same job.

The EIA (listed as [KB+10]) is at the first level, not having any explicit tasks or assignments. The

EEIA introduces the (flexible) planning concept so is better represented by the 3rd level in this

dimension. In all other dimensions the 2 advisors applied to DIC can be seen as the same.

Possible answers, with some overlap:

1. An agent does not even have the concept of a task. See [KB+10] [WBH08]-F.

Without the explicit command to go to fulfill a particular task, there are still examples of sys-

tems that can direct a vehicle to locations where it may be productive. These agents are simply

compelled by a variety of other means to be in the right places at the right times. 2. Agents autonomously agree to single tasks (at a time) only. See [WBH08]-D.

A vehicle agent can know the details of its work in some cases. The pickup location and time

window of the next job to fulfill can be all the information required for the vehicle’s navigation. 3. Agents follow a plan, but it may be changed/ignored/re-optimized over-time.

See [MHH07] [MS+10]-O [MS+10]-A [DC05] [FMP96] [HE+02].

Either the vehicles or some central planner will carry knowledge of a portion of the tasks that

have not started yet. Various ways exist to exploit this knowledge for better efficiency. 4. Agents carry a plan and will not deviate from it. See [DC05] [JO86] [DDS91].

This represents what is usually referred to as a static system. The existence of complete knowl-

edge is meant to allow for a closer-to-optimal solution.

7.3.2 How does the system handle new requests mid-execution?

Handling new requests on-the-fly is the principal quality of dynamic systems. These are systems that can operate without complete knowledge, and will often (but not necessarily) start completely without knowledge.

107 Unlike Section 7.3.1, this does represent a strict requirement in some cases. If a system will be used for taxis or pizza delivery, immediate responses are required for unexpected requests. Pizzas are demanded in 30 minutes or less. Passengers waiting for taxis in a convenience store parking lot cannot call for a ride a day in advance to accommodate planning.

Possible answers, with some overlap:

1. Tasks must be known in advance, for planning. See [JO86] [DDS91].

This is also necessary in the strictly static case. Complete knowledge is used for a near optimal

plan, and behavior is deterministic. 2. Tasks may be added to a plan by re-optimizing the plan by a central planner. See

[MS+10]-O [DC05] [HE+02].

A central planner may have global or near global knowledge of the environment. If this is the

case, better solutions can be found, compared to the vehicle’s narrower view. 3. Tasks may be added to a plan by agents autonomously accepting them. See [MHH07]

[MS+10]-A [FMP96]. The alternative to a central planner making adjustments to a plan, is that they are made locally,

by the vehicle itself. The benefit is that the system is more robust to poor communications and

distributes that computational load required for making decisions, though the result will tend

to be a less optimal solution. 4. Agents accept tasks autonomously because there is no plan. See [KB+10] [WBH08]-

D [WBH08]-F. Some systems will not explicitly assign tasks to vehicles. Instead, the vehicles will simply

explore the environment and accept any tasks they encounter. This is typically the option which

is most robust to failure of system components, but also the option of poorest efficiency.

7.3.3 What is the impact of an agent/component failing?

Sometimes a vehicle may fail. This is a very familiar situation in the real world, and only some ap- proaches prepare for it. If a failing vehicle has tasks assigned to it, the tasks can still be completed by reassigning them to other vehicles. Otherwise, the global solution quality is impacted. In other

108 cases, it can be a central planner which fails. This is sometimes but not always acknowledged as an entity in the environment. It is usually the case in auction based systems, which require something along the lines of an auctioneer, though there are exceptions to this too.

Obviously, in all cases that it is better not to have a single point of failure in any system. For a generalized system, in particular, this is a strong feature because it offers more flexibility.

Possible answers, with some overlap:

1. If a vehicle fails, others will compensate for it. See [KB+10] [WBH08]-D [WBH08]-

F [MHH07] [MS+10]-A. When one vehicle fails, the tasks that were assigned to it will be reassigned to a different vehi-

cle. This is still a concern in the cases where assignments are not made. If regional divisions

exist, vehicle failure should not be able to completely remove all vehicles from a region. 2. A centralized element exists which is essential to the system. See [MS+10]-O

[DC05] [FMP96] [HE+02]. This central component can take the form of a dispatcher, optimizer, or auctioneer in some

systems. It is responsible for assigning work in some cases, or simply notifying agents of

changes to the plan. The component failing will often allow vehicles to finish their current

work, but proceed no further. 3. The system does not acknowledge the possibility of a vehicle failing. See [JO86]

[DDS91]. Often systems only seek to make the best approximation of an NP-Hard problem. For these,

vehicle failure is not mentioned because the original Travelling Salesman Problem did not

acknowledge vehicle failure either.

7.3.4 What happens when communication fails mid-execution?

General robustness depends on what happens when the network fails. This is not only possible but a very common experience in electronic communication, and it should be acknowledged and compensated for by the system.

109 This is somewhat related to 7.3.3, since a remote component of the system being offline is indis- tinguishable from being unable to communicate with that component.

Still, it is given a separate treatment here since communication networks are typically the respon- sibility of a utility company, not part of the system itself. Also, some systems make rare use of the communication, unlike vehicles which are always fundamental to the system. In many cases communication is not even mentioned.

Possible answers, with some overlap:

1. Not acknowledged in the design. See [KB+10].

Like component failures, communication failures were not considered in the original mathe-

matical model. They are also often ignored in some approaches, no matter how common they

are in real world implementations. 2. Vehicles can still navigate to the last known destination. See [WBH08]-F [HE+02].

In many systems, the vehicles themselves will have memory to store a task’s detail. At min-

imum, the next location, either for pickup or delivery will still be known. Though, if this

next location was a pickup location the job may have been completed by another vehicle (or

canceled during the communications outage). 3. Vehicles can complete known tasks but not learn of new tasks.

See [WBH08]-D [MHH07] [MS+10]-O [MS+10]-A [DC05] [FMP96]. Where a vehicle’s memory of a plan is robust enough to store detail of several (or all) tasks.

Tasks have explicit assignment, so there is no risk of the task being completed by a different

vehicle during the communications outage. 4. No communication was required mid-execution. See [JO86] [DDS91].

In static approaches, vehicles receive the entire plan (which never changes) prior to start-of-day

deployment. There are no changes or other information past this point that would be missed

during a communications outage.

110 7.3.5 Which agents require constant communication?

Communication is a limiting factor on scalability. Particularly in the cases of broadcast traffic, there is a trade-off faced. If broadcast is heavily used, it is unnecessary to have centralized elements

(like auctioneers, planners, optimizers, etc.) but it will effectively limit the number of agents on a network to a couple hundred.

Possible answers, with some overlap:

1. Not acknowledged in the design. See [KB+10].

Occasionally the description of a system will focus on how the problem is solved, while leaving

out details of how the solution is communicated. There are multiple ways to do this. It may

make sense to leave the question open and treat it as a separate problem. 2. Communication occurs only at pre-determined locations. See [JO86] [DDS91].

A vehicle and a central planner may temporarily be in the same building, or even the same

room, such as a depot. Reliability of communications is much less of a concern over such short

distances. 3. A central planner knows every task assignment. See [MS+10]-O [DC05] [FMP96].

There is a minor trade-off here. If the central planner knows every assignment, it has the

potential to use global knowledge to improve efficiency. The downside is that it may eventually

become a bottle neck, if the planner’s resources are insufficient for the traffic. 4. A central planner knows every vehicle’s position. See [HE+02].

This is much like the previous case, the system has more knowledge to work with when op-

timizing, but must also handle an increased amount of network traffic. This is a more severe

case, though. Vehicle positions are constantly changing far more often than customer requests. 5. Agents are aware of only a portion of tasks and other vehicles. See [WBH08]-D.

To ease the burden of network traffic, it is often useful to partition the space somehow. Typi-

cally, this is done according geographic conditions. The trade-off is that solutions are no longer

based on global knowledge. 6. Every vehicle is aware of every task, or other vehicle. See [WBH08]-F [MHH07]

[MS+10]-A.

111 This is the class that requires the most network activity of all. It is typically done to solve

centralization problems, such as having auctions without an auctioneer.

7.3.6 How much variation is there in vehicle size?

In some cases all vehicles are the same, and it does not matter which is assigned to which job. In others, some are specialized. It may be as simple as being large enough to carry certain loads, but there may also be other conditions.

If jobs have standardized shapes and sizes, the vehicles can also standardize. An example would be a forklift, which can lift any load placed on a pallet.

However, for the transportation of people, there may be other issues like wheelchair accessibility, which requires a specialized vehicle. In common experience, a taxi company does not specialize every bus this way, but only for certain vehicles which are available on request.

Possible answers, with some overlap:

1. Vehicles are unit-weight, they carry exactly 1 or 0 tasks. See [WBH08]-D [WBH08]-

F [MHH07] [MS+10]-O [MS+10]-A [HE+02]. Many systems describe vehicles in a simple Boolean way: loaded or empty. Generalizing

this so that it can compare to systems where capacity varies, we represent every vehicle with

capacity 1, and every task with size 1. 2. Vehicles are all the same size, but are larger than a single job requires. See [JO86].

Vehicles in this case have a counted capacity. Their capacity is described as some multiple of

tasks, with all tasks considered to be the same size. 3. Vehicles and jobs are non-homogenous, coming in different sizes. See [FMP96].

Each vehicle has an attribute, its maximum load. Each task also has an additional attribute,

size. This could represent weight or volume. Some systems represent both. 4. Vehicles and jobs are non-homogenous, coming in different specializations. See

[DC05] [DDS91].

112 Some systems recognize certain tasks as having a special requirement, such as wheelchair

accessibility or hazard material classification. Obviously, a vehicle specialized for one will not

handle the other, so vehicles are classified in a way that is compatible with the work.

7.3.7 How can tasks be given emphasis or priority?

It can be useful to make one job “more important” than another. If one person is going to the mall, but another is going to the hospital for an emergency, the one going to the hospital is more important. This is particularly important when there are too few vehicles to complete every task.

Priority is how those decisions would be made.

Even when there are enough vehicles, you may want to emphasize which jobs should be done sooner, or faster.

Possible answers, with some overlap:

1. All tasks are given equal priority. See [KB+10] [MS+10]-O [JO86].

The concept of priority is not universally part of the PDP-TW specification, so not all systems

attempt to address it. 2. Priority may emerge as a side-effect of other attributes. See [MHH07] [MS+10]-A

[DC05] [FMP96]. Often, in auction-simulators, a job will have a value to be compared against the cost of fulfill-

ment to determine profit. This can create a priority system in effect, especially when combined

with cancellations, reserve prices, or other enhancements. 3. All tasks have a numeric priority. See [WBH08]-D [WBH08]-F [HE+02] [DDS91].

In a system with explicit priority, they will tend to be given a number to indicate how important

they are. Like the weight or size of a task, this will be an additional attribute.

7.3.8 What is the optimization criterion?

With many parameters to change we need a way to measure whether or not a change improves the system. The simplest criteria is likely the count (or percentage) of completed jobs. This is not

113 really the best count though, since most systems are not operating at their breaking point.

Successful systems usually complete near 100% of jobs in most cases, so to distinguish between them, we need other measures. Distance is common because it is easy to compare with optimal solutions of the Traveling Salesman optimizers.

Time may be a better measure, because distance would not penalize standing still while jobs are completed late.

Possible answers, with some overlap:

1. Distance. See [KB+10] [DC05] [FMP96] [HE+02] [DDS91]. This is the easiest to measure in a simulator. A log of each vehicle’s position over time will

show the total distance traveled. More importantly, this corresponds directly to fuel spent. It

makes economic sense to minimize distance traveled. 2. Time. See [WBH08]-D [WBH08]-F [HE+02]. Time is what a customer will find important more than distance (which the customer will not

even want to know). The system that completes a fixed number of tasks in the shortest time will

usually also complete the largest number of tasks in a fixed amount of time. It makes economic

sense to be able to get the most done with limited resources. 3. Ratio of empty/non-empty. See [MHH07] [MS+10]-O [MS+10]-A.

Some authors simply consider a loaded vehicle to be doing useful work. And, it is a stronger

argument to say that an empty vehicle is a wasted resource. This often gets attention in the

analysis of a system’s performance, but is not the primary criterion for optimization. 4. Constraint violation. See [MHH07] [MS+10]-O [MS+10]-A. A task being completed late or failing to complete it entirely is a matter something that deserves

attention, though addressing the root of the problem usually means optimizing for distance or

time. 5. Profit. See [MHH07] [DC05] [FMP96] [JO86]. This implicitly can take into account distance, time, or both. A task has value, and profit is

found from the cost of completing the task. This task will, at minimum, requires fuel spent,

possibly also vehicle wear, and driver’s wages, which implies time is considered.

114 7.3.9 What limitations are there on simultaneous jobs?

This is closely related to 7.3.6, but not quite the same question.

No matter how many empty seats are in a private limousine, it will not pick up a new customer until after it has dropped off its current group. But, for a large enough group of people it may send multiple vehicles to a single request.

There is a fundamental separation between the questions “how much can it carry?” and “will it carry both at the same time?”

Possible answers, with some overlap:

1. Vehicles are unit-capacity (1 task per vehicle, at a time). See [KB+10] [WBH08]-D

[WBH08]-F [MHH07] [MS+10]-O [MS+10]-A [HE+02]. In the case that capacity is not mentioned in the design of a system, all vehicles will be implic-

itly given the capacity 1, and all tasks are given size 1. In these cases, vehicles must deliver the

current load before picking up the next load. 2. Vehicles can carry up to some counted limit simultaneously. See [JO86].

Using transportation of people as an example, a vehicle like a bus can carry some number

of people according to the number of seats it has. The limit of simultaneous fares does not

take into account the size of each passenger, only the count. It may work the same way with

standardized loads, like shipping containers. 3. Vehicles can carry up to some weight limit simultaneously. See [DC05] [DDS91].

Here, the individual size attribute of each task is taken into account. The sum of weights of

loaded jobs must remain below a constraint, the capacity of the vehicle. 4. Vehicles can divide tasks, taking only the portion they can carry. See [FMP96].

In the case that a vehicle is not full, but it does not have the remaining capacity to take an entire

task, the vehicle has the option to take only a portion of the task. Effectively that task is divided

into multiple smaller tasks, which vehicles are more able to carry.

115 7.3.10 What ability is there to cancel a task?

This is often ignored, though does offer functionality which would be useful on occasion. If a job is no longer required, no longer possible, or was entered in , there should be a way to remove it from the system, freeing those resources that would be on their way to the pickup site.

There are cases where features used to overcome the eager bidder problem can be used to abandon an assignment. Sometimes, but not in all cases, this could be adapted to remove a job completely.

It will at least be acknowledged here, even if the original author did not.

Possible answers, with some overlap:

1. Not acknowledged by the system. See [MHH07] [HE+02] [JO86] [DDS91].

Some systems are static and do not allow any changes to a schedule. Others do allow new

tasks to be added mid-execution but do not acknowledge the possibility of those tasks being

canceled. 2. No commitments are made which would need to be canceled. See [KB+10] [WBH08]-

F.

In some systems, no explicit assignments are made until the moment the task is picked up. This

is common in the case where a vehicle does not have the concept of a task. 3. Agent has ability to forget or trade-away assigned tasks. See [WBH08]-D [MS+10]-

A [FMP96]. A task assignment may look optimal in the short term and turn out not to be when more knowl-

edge is available. There are many strategies for unassigning tasks in some systems: contingen-

cies, levelled contracts, reallocations, exchanges, and rejections. In some cases, these could

make up an improvised cancellation ability. 4. There is an explicit method for removing tasks. See [MS+10]-O [DC05].

In the complete logistics functionality, the modifications of a plan will involve both additions

and removals. An interface is provided which allows for the cancellation of task.

116 Chapter 8

Conclusion

In this thesis we have provided an extension to the Efficiency Improvement Advisor, which was previously described in [SD+11]. Extending the Advisor was done for 2 immediate purposes. The

first was to incorporate knowledge from many disparate sources. Certain knowledge (tasks where a customer pays in advance, for example) was the primary motivation for doing this. As a result of including certain knowledge there is now a step where arbitrary knowledge can be injected, if new sources are discovered in the future.

Related to the first purpose, the second is concerned with how the Extended Efficiency Improve- ment Advisor handles the inevitably larger volumes of knowledge, most importantly when the knowledge is non-recurring. Advice (still in the form of exception rules) from the EEIA is ar- ranged so that it leads the system under advice to follow an explicit plan. Where the EIA only corrected bad behavior, the EEIA now creates advice for every task, whether the behavior was bad or not. With including knowledge that is non-recurring, the optimal plan can be entirely different every day, so the distinction between good and bad behavior will also be different. Explicit plans become favorable over incremental corrections, as a result.

Most importantly, the main strength of the EIA is that it is not a central control, not an essential component, and not a single point of failure. The EEIA, with its new features, is also none of these things. Observations of the world are only made via agent testimony, and influence over the world is only via exception rules. Both receiving agent histories and sending rules are done unobtrusively, “after hours”, so that a system which operates with unreliable communications is still able to do so when the EEIA is applied.

A new procedure for generating sets of exception rules representing explicit plans was given. This

117 procedure was defined abstracted to the level of Dynamic Task Fulfillment. As a consequence of this, the EEIA is applicable in any scenario where autonomous agents have been applied to a problem in the class of Dynamic Task Fulfillment.

Through experiment, the benefits of the EIA were evaluated in an instantiation of Pickup and

Delivery with Time Windows. This is an excellent example of Dynamic Task Fulfillment, and was reused again in the evaluation of the EEIA. Pickup and Delivery continues to be a useful area to a concept like the EEIA, since it is directly applicable to industry, and efficiency gains result in money saved. Despite this, Pickup and Delivery does not show a very important quality: generality to all Dynamic Task Fulfillment. So, to this end all experiments have been repeated in a second problem domain: Whack-a-mole. The same results and same benefits (better in some cases), which have been shown in Pickup and Delivery, also result from applying the EEIA to

Whack-a-mole.

Whack-a-mole itself could appear to be a frivolous game, especially when compared to other transportation examples but this ignores how closely it is a metaphor for many real world problems.

The important thing to note is that an identical analytics and advice procedure (of the EEIA) which was used for Pickup and Delivery has been applied in a completely different setting with the same result. For Pickup and Delivery we are able to measure the quality of solutions both by distance and by time, and get the desired improvements. With Whack-a-mole, we have changed the quality measure again: hit/miss ratio. To add further diversity, 2 different dynamic solutions were used.

The first, FIFO, (used for single hammers) is a commonly chosen approach, not for its performance, instead for its simplicity. The second, Nearest Neighbor, (used for multiple hammers) is far more aggressive. These 2 dynamic strategies serve as instruction for the general method of implementing the EEIA’s rules in a rule following agent. In experiments, these 2 strategies also show the effect on performance when an agent is able to follow these rules.

Experiments have been done in 3 series, each showing the 2 application domains in parallel. The

first series of experiments, the Degree of Dynamism, showed the effect of the amount of knowl-

118 edge used. In both application domains, it is clear the performance correlates with the amount of knowledge given to the EEIA. Providing additional knowledge to the Advisor was the primary rea- son for extending it, and these experiments show that providing the ability to do this is worthwhile, by all of the measures used.

The second series of experiments shows what happens when full knowledge is provided to the

EEIA. And, as a consequence, we see the EEIA’s ability to make a dynamic system follow a plan.

Apart from occasional inconveniences such as crowding, vehicle-deadlock and traffic jams, for both Pickup and Delivery and Whack-a-mole, the dynamic agents, under influence of exception rules, have shown the ability to do precisely this.

Finally, the third series of experiments comes the closest to recreating the EIA’s original test en- vironment. It has full runs, with some tasks recurring, and others not, creating the potential for learning. The difference in this series (both from previous series, and previous testing of the EIA) is that we craft runs in such a way that there is noise, preventing the recurring tasks from being learned with complete accuracy. In this setting we see the EEIA exposed to a small portion of misleading knowledge, where a task is predicted and included in the plan, but ultimately does not appear. This is obviously harmful to performance, but an accurate reflection of learning from noisy data in real life. Through this series of experiments we show that despite this misleading knowledge, a net gain in performance is still to be expected. Under these circumstances, for large enough run instances, we see an efficiency gain of the EIA over the unadvised dynamic system, and a further gain of the EEIA’s advice over the EIA’s advice.

In the future, there are many other important instantiations of Dynamic Task Fulfillment which could benefit from the EEIA’s application. Another problem of concern to industry would be the

Job Shop Problem, which the EEIA would likely perform well in. There are also other variants of Pickup and Delivery, such as those where fungible commodities are transported, rather than pairing the origin and destination. This would require a different task structure, but should be an equally suitable testing arena. Success or failure in any of these new application domains

119 depends on the tuning of parameters (preptime, timeout, ... etc., see Section 3.2). To enhance the hand-tuned parameters, or the ensure that they are set correctly in new environments, learning and self-adaptation of these variables would be a valuable enhancement. Given the consistent results seen in different problem domains, and by a variety of measures, there is reason to be confident that we will see them here as well.

120 Bibliography

[BCL10] G. Berbeglia, J-F. Cordeau, G. Laporte, Dynamic pickup and delivery problems, Euro-

pean journal of operational research 202.1, 2010, pp. 8–15.

[BC+07] G. Berbeglia, J-F. Cordeau, I. Gribkovskaia, G. Laporte, Static pickup and delivery prob-

lems: a classification scheme and survey, TOP 2007, Vol. 15, pp. 1–31.

[BM+06] J. Branke, M. Mnif, C. Muller-Schloer, H. Prothmann. Organic Computing-Addressing

complexity by controlled self-organization. In Leveraging Applications of Formal Methods,

Verification and Validation, IEEE, 2006, pp. 185–191.

[BFV97] H.-J. Burckert,¨ K. Fischer, G. Vierke, Teletruck: A holonic fleet management system, In

Proceedings of 14th European Meeting on Cybernetics and Systems Research, 1998, pp. 695–

700.

[DPH07] P. Davidsson, J.A. Persson, J. Holmgren, On the integration of agent-based and mathe-

matical optimization techniques, In Agent and multi-agent systems: technologies and applica-

tions, 2007, pp. 1–10.

[DC05] K. Dorer, M. Calisti, An adaptive solution to dynamic transport optimization, In Pro-

ceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent

Systems, ACM, 2005 Jul 25, pp. 45–51.

[DDS91] Y. Dumas, J. Desrosiers, F. Soumis, The pickup and delivery problem with time win-

dows, European Journal of Operational Research 54.1, 1991, pp. 7–22.

[FK93] K. Fischer, N. Kuhn, A DAI approach to modeling the transportation domain, Research

Report RR-93-25, DFKI, 1993.

[FMP96] K. Fischer, J. P. Muller,¨ M. Pischel, Cooperative transportation scheduling: an applica-

tion domain for DAI, Applied Artificial Intelligence 10.1, 1996, pp. 1–34.

121 [FSS03] K. Fischer, M. Schillo, J. Siekmann, Holonic multiagent systems: A foundation for the

organisation of multiagent systems, In Holonic and Multi-Agent Systems for Manufacturing,

2003, Springer Berlin Heidelberg, pp. 71–80.

[GK+06] S. Gutierrez,´ S.O. Krumke, N. Megow, T. Vredeveld, How to whack moles, Theoretical

computer science, 2006, pp. 329–41.

[Har75] J.A. Hartigan, Clustering Algorithms, John Wiley and Sons, 1975.

[HE+02] M.C. Heijden, M.J.R. Ebben, N. Gademann, A. van Harten, Scheduling vehicles in au-

tomated transportation systems, OR Spectrum, vol. 24, 2002, pp. 31–58.

[HH+02] M.C. Heijden, A. van Harten, M.J.R. Ebben, Y.S. Saanen, E. Valentin, A. Verbraeck,

Using simulation to design an automated underground system for transporting freight around

Schiphol airport, Interfaces 32(4), 2002, pp. 1–19.

[Hud11] J.W. Hudson, Risk Assessment and Management for Efficient Self-Adapting Self-

Organizing Emergent Multi-Agent Systems, Master’s Thesis, University of Calgary, 2011.

[HD+10] J. Hudson, J. Denzinger, H. Kasinger, B. Bauer, Efficiency Testing of Self-adapting

Systems by Learning of Event Sequences, ADAPTIVE 2010, pp. 200–205.

[IBM06] IBM: Autonomic Computing Whitepaper, An Architectural Blueprint for Au-

tonomic Computing, URL: http://citeseerx.ist.psu.edu/viewdoc/download

?doi=10.1.1.150.1011&rep=rep1&type=pdf, June 2006.

[JO86] J.J. Jaw, A.R. Odoni, H.N. Psaraftis, N.H. Wilson, A heuristic algorithm for the multi-

vehicle advance request dial-a-ride problem with time windows, Transportation Research Part

B: Methodological, 20(3), 1986, pp. 243–257.

[Ka10] H. Kasinger, Design and Operation of Efficient Self-Organizing Systems, PhD diss., Uni-

versity of Augsburg, 2010.

[KBD09] H. Kasinger, B. Bauer, J. Denzinger, Design pattern for self-organizing emergent sys-

122 tems based on digital infochemicals, In IEEE Engineering of Autonomic and Autonomous

Systems, 2009, pp. 45–55.

[KB+10] H. Kasinger, B. Bauer, J. Denzinger, T. Holvoet, Adapting environment-mediated self-

organizing emergent systems by exception rules, In Proceedings of the Second International

Workshop on Self-organizing Architectures, ACM, 2010, pp. 35–42.

[KDB08] H. Kasinger, J. Denzinger, B. Bauer, Digital semiochemical coordination, Communica-

tions of SIWN, 2008, pp. 133–9.

[LMS02] A. Larsen, O.B.G.D. Madsen, M. Solomon, Partially dynamic vehicle routing-models

and algorithms, Journal of the operational research society, 53(6), 2002, pp. 637–646.

[vLH15] R.R. van Lon, T. Holvoet, Towards systematic evaluation of multi-agent systems in large

scale and dynamic logistics, In International Conference on Principles and Practice of Multi-

Agent Systems, Springer International Publishing, 2015, pp. 248–264.

[vLH+12] R.R. van Lon, T. Holvoet, G. Vanden Berghe, T. Wenseleers, J. Branke, Evolutionary

synthesis of multi-agent systems for dynamic dial-a-ride problems, In Proceedings of the 14th

Annual Conference Companion on Genetic and Evolutionary Computation, 2012, pp. 331–

336.

[MS+10] T. Mahr, J. Srour, M. Weerdt, R. Zuidwijk, Can agents measure up? A comparative study

of an agent-based and on-line optimization approach for a drayage problem with uncertainty,

Transportation Research Part C: Emerging Technologies, 2010, Vol. 18, pp. 99–119.

[MSW11] T. Mahr, T. Srour FJ, M. de Weerdt, Using simulation to evaluate how multi-agent

transportation planners cope with truck breakdowns, In IEEE International Conference on Net-

working, Sensing and Control (ICNSC), 2011 pp. 139–144.

[MHH07] M. Mes, Martijn, M. Van Der Heijden, A. Van Harten, Comparison of agent-based

scheduling to look-ahead heuristics for real-time transportation problems, European Journal

123 of Operational Research 181.1, 2007, pp. 59-75.

[MvL+14] J. Merlevede, R.R. van Lon,T. Holvoet. Neuroevolution of a multi-agent system for

the dynamic pickup and delivery problem, In International Joint Workshop on Optimisation in

Multi-Agent Systems and Distributed Constraint Reasoning (co-located with AAMAS), 2014.

[PDH08a] S.N. Parragh, K. Doerner, R.F. Hartl, A survey on pickup and delivery models part I:

Transportation between customers and depot. Journal fur¨ Betriebswirtschaft, 58, 2008, pp. 21–

51.

[PDH08b] S.N. Parragh, K. Doerner, R.F. Hartl, A survey on pickup and delivery models part

II: Transportation between pickup and delivery locations, Journal fur¨ Betriebswirtschaft, 58,

2008, pp. 81–117.

[SLT08] R. Schumann, A.D. Lattner, I.J. Timm. Management-by-exception-a modern approach to

managing self-organizing systems, Communications of SIWN, 2008, pp. 168–72.

[Sha94] M. Shaw, Beyond objects: A software design paradigm based on process control, ACM

SIGSOFT Software Engineering Notes, 1995, pp. 27–38.

[Sm80] R.G. Smith, The contract net protocol: High-level communication and control in a dis-

tributed problem solver, In IEEE Transactions on Computers, 1980, pp. 1104–1113.

[SD+10] J-P. Steghofer, J. Denzinger, H. Kasinger, B. Bauer: Improving the Efficiency of Self-

Organizing Emergent Systems by an Advisor, Seventh IEEE International Conference and

Workshops on Engineering of Autonomic and Autonomous Systems, 2010, pp. 63–72.

[SD+11] T. Steiner, J. Denzinger, H. Kasinger, B. Bauer, Pro-active Advice to Improve the Ef-

ficiency of Self-Organizing Emergent Systems, Eighth IEEE International Conference and

Workshops on Engineering of Autonomic and Autonomous Systems, 2011, pp. 97–106.

[TP+11] S. Tomforde, H. Prothmann, J. Branke, J. Hahner,¨ M. Mnif, C. Muller-Schloer,¨

U. Richter, H. Schmeck. Observation and control of organic systems, In Organic Computing-A

124 Paradigm Shift for Complex Systems 2011, Springer, Basel, pp. 325–338.

[WBH08] D. Weyns, N. Boucke, T. Holvoet, A field-based versus a protocol-based approach for

adaptive task assignment, Autonomous Agents and Multi-Agent Systems, Vol. 17, no. 2, 2008,

pp. 288–319.

[Wit08] The Rock-afire Explosion, Directed by Brett Whitcomb, Connell Creations, 2008.

[WH04] D. Weyns, T. Holvoet, A formal model for situated multi-agent systems, Fundamenta

Informaticae, 63(2-3), 2004, pp. 125–58.

125 Appendix A

Optimization Methods

The EIA and the EEIA operate in a way that is abstracted to the level of DTF. In order to optimize problems, the optimizers used to create advice must also be abstracted. This appendix describes 2 approaches for doing this that are flexible enough for all of DTF, and make no explicit reference to either PDPTW or Whack-a-mole (or any other problem).

Given a list of tasks (a run instance) and a quality measure, either of these methods will be able to

find the optimal set of assignments (or a good estimate), without needing to know the underlying detail of the problem.

We have a general formalism, which we use to describe any kind of search (including both of these approaches). This formalism is built upon a graph, named the search model. The nodes in this graph are search states (usually containing partial solutions). Transitions make up possible steps from an initial state to a goal state. A search control is a function which has the responsibility of choosing the “next” search state, given any current search state.

The following describes both approaches in terms of their search model and search control.

A.1 Genetic Algorithm

Genetic algorithms are approaches to search which are inspired by biology and evolution. In these, a set of individuals are mixed, combined, and mutated randomly, in a way that gives a slight preference to better individuals. What is normally called a quality measure, is called a fitness function in genetic algorithms. This is done as an analogy to “survival of the fittest”. The fitness function can be applied in many ways. Often, it is used during selection, to give fitter individuals a

126 better chance of being mutated (analogously, reproducing). Other approaches, such as this one, use

it to remove worse individuals, and prevent them from being selected. In both cases, it manifests

as subtle pressure over the long term, which eventually improves the population.

What follows is essentially the genetic algorithm that was used by the EIA in [HD+10]. It operates

at the level of run instances, tasks, solutions and assignments. All problem specific detail is hidden

in the function qual : Solution R. → The Solution structure, described in Definition 2.1.5, is what will be converted into a plan in the derive stage. It is precisely what we are optimizing, and its measure is the provided qual :

Solution R function. → Definition A.1.1 (Individual). An individual, indi Indi, is a list of assignments to the run in- ∈ stance:

 indi = (ta1, g1,t1)...(tam, gm,tm) A A

So, in other words, individuals are solutions. This makes an obvious choice for fitness:

Definition A.1.2 (Fitness Function). A fitness function assesses the quality of an individual numer- ically:

fit : Indi R →

The provided qual : Solution R will be used as fitness: →

fit = qual

As expected, a set of individuals is a population:

Definition A.1.3 (Population). A population is a set of individuals:

Pop = (Indi) P

The search model is then:

127 Definition A.1.4 (Search Model). The search model is a directed graph, Mod = (Pop,Tran), with:

Tran Pop Pop ⊆ ×

In other words, (pop1, pop2) Tran if and only if pop1 pop2 is a permitted transition between ∈ → populations.

A selection operator will choose individuals from our population. While a roulette wheel is a common selection method, it was not used here. We use a random selection that does not consider the fitness of individuals. Preference for fitness is provided by other parts of the search.

Definition A.1.5 (Selection). Based on a weighted (newRand) random value, the selection func- tion,

sel : Pop Indi → will return one of two things. It will return a new random individual (10%), or it will return a random member of the population (90%).

Our mutation operator is fairly typical. It randomly mutates a single individual, but ensures that the validity of the solution is preserved. No task will be assigned to multiple agents.

Definition A.1.6 (Mutation Operator). The mutation operator takes one individual and returns a new individual, slightly modified.

mut : Indi Indi →

Given a

 sol = (ta1, g1,t1)...(tam, gm,tm) A A

The crossover operator also operates in the typical way, but enforces the same validity. It combines assignments from 2 solutions into a single solution. Each assignment in the new individual will be borrowed from one of the parent individuals. As with the Mutation Operator, the Crossover

128 Operator will enforce the requirement that the individual must have one assignment per task in the

run instance.

Definition A.1.7 (Crossover Operator). The crossover operator takes 2 individuals and returns a new individual, taking parts from each parent.

cross : Indi Indi Indi × →

The crossover operator selects a random index, tasks prior to this index will take their assignments

from the first argument solution. The remaining tasks take their assignments from the second

argument solution.

The new operator generates random valid solutions, with each task assigned exactly one vehi-

cle.

Definition A.1.8 (New Random Operator). This operator creates a new individual from no argu-

ments.

new : Indi ∅ → This is used to create the initial population, among other things.

The search control provides the procedure which combines all the previously defined compo-

nents.

Definition A.1.9 (Search Control). The search control will be a map:

Cont : Pop EnvGA Pop × →

The control will follow the search model, so that if cont(pop1) = pop2, there is a transition in the model where:

(pop1, pop2) Trans ∈

The search control will construct a new population one individual at a time, via the new random

operator. Once a initial population of size minSize is constructed, the following steps will be

iterated:

129 1. Select an operator, mut (75% chance) or cross (25% chance), by weighted (mutCross)

random selection.

2. Call sel to select the required number of individuals for the chosen operator.

3. Apply the chosen operator to the selected individual(s) and add the result to the

population.

4. This new individual is compared against the “best so far”, and recorded if better.

5. If the population size exceeds maxPop, the least fit individuals are removed until

the population has size minPop.

6. Repeat until the output population reaches the required size.

These steps are iterated generations times. See Table A.1.

Parameter Name Description m < 20 m 20 ≥ minSize Minimum population size 225 3225 maxSize Maximum population size 550 3550 generations Iterations of GA 100000 15000000 mutCross Preference for mutation 0.75 0.75 newRand Preference for new individuals 0.10 0.10

Table A.1: Some parameters adjusted for problem size, where m is the number of tasks.

After the desired number of iterations complete, the solution determined “best so far” is returned

as the estimated optimal solution.

A.2 Branch and bound

Branch and bound is a form of tree search used for optimization. Unlike the Genetic Algorithm

described earlier, it can be used to find the exact optimal. While this does provide a certain benefit

to the end user, in terms of solution quality, it is not useful in practice for larger problems. There

are variations where this search can exit early to return an estimate, but that is not what is described

130 here. Branch and bound will only be applied to small experiments here, so time requirements are

not an issue.

The search model for Branch and Bound is an And-tree search model. An And-tree will be used

to represent the division of the search space of solutions into partitions.

Definition A.2.1 (And-tree). An And-tree will be either, a single node, with content: prob, to determine a particular subspace and a solution status:

tree = (prob,sol) or a node with a list of subtrees,

tree = (prob,sol,t1,...,tn) where each t1,...,tn is an And-tree, and sol ?,yes . ∈ { } Partial solutions (a list of tasks, acting as prefix for a full solution, as in Def. 2.1.5) to the run instance, ri, will be used as a realization of prob. Given a partial solution par, the space described would be solutions, sol Solution, where par sol. ∈ ⊆ We will also require that the best solution quality found so far is recorded throughout. This brings us to the definition of a Search State

Definition A.2.2 (Search State). A search state (tree,best) S, is a pair: ∈ tree - An And-tree •

best R - the best solution found so far, (or when one is not found yet) • ∈ ∪ {∞} ∞ Transitioning from one search state to another will primarily be done by selecting a leaf node in the tree, and expanding it into a partition of subproblems (new leaf nodes as child nodes). The branch- ing will be determined by a relation Div from partial solutions to lists of partial solutions:

 par,s1,...,sz Div ∈  if and only if each si = par tai, gi,ti , where, ∪ { A } 131 start end tai = (detaili,t ,t ) is any task in ri, but not yet assigned in par. • i i

gi A - Any of the available agents •A ∈ early end ti - calculated as the t [t ,t ] which gives the earliest case with best fbound(si), • ∈ i where,

early  start  t = max tq (taq, gi,tq) par t + 1 A ∈ ∪ { i } early end Of course, Div requires t < ti , as it will simply omit impossible assignments.

In other words, a partial solution, sol, branches by adding a single new assignment, one for each task, and each agent, assuming it is possible to make a valid assignment. The time is calculated, so does not effect the branching factor.

The ti calculation here attempts to the predict actual time of fulfillment for the task, although this is not typically crucial for rule derivation. Specific to the single hammer Whack-a-mole case, the bounding function is chosen to penalize early times so this can predict earliest possible fulfillment time exactly.

When selecting a leaf node to expand by application of Div, nodes will be given priority first by depth, with a secondary priority of minimizing the latest assignment time (essentially, minimizing the tearly calculation).

A leaf node will be considered solved when ri = prob . Since uniqueness of tasks in assignments | | | | is preserved by Div, this will only happen when prob contains exactly one assignment per task, and so will satisfy Definition 2.1.5.

 The initial search state will be s0 = ( ,?), . This is the empty solution, with no tasks assigned. ∅ ∞ As it is a subset of every Solution, this represents the search space of all solutions.

The search control can then be defined:

Definition A.2.3. (Search Control) The search process will be a function K : S nv S, where × E → an argument (tree,best),e will be handled as follows:

132 1. Select the unsolved leaf node from tree with best priority.

2. if lea f = (par,?) is solved, define updated tree0 with this node changed to (par,yes), then,

if qual(par) < best, return (tree0,qual(par)) • else, return (tree0,best) • 0 3. if lea f = (par,?) is unsolved, and fbound(par) best, define updated tree with this ≥ node changed to (par,yes) and return (tree0,best).

0 4. if lea f = (par,?) is unsolved, calculate (par,s1,...,sn) Div, define updated tree ∈ 0 with this node changed to (par,?,(s1,?),...,(sn,?)) and return (tree ,best).

And the search will halt when the goal function is met:

Definition A.2.4 (Goal Function). The goal function is a map G : S yes,no : → { }   yes : q par yes best  = ( , ),   q yes : q = par,?,t1,...,tn with G(ti), i [1,n] 7→  ∀ ∈   no : else

As a result of expanding and exploring the tree, a sequence of updates to the best attribute of the state will be performed. With each update, a new solution has been discovered which is better than the previously known best. The ultimate solution in this sequence will be the optimal and the one returned by the search.

133 Permission to use copyrighted material 1 message From: Joerg Denzinger < @ucalgary.ca> Tue, Jul 24, 2018 at 2:15 PM To: Nick Nygren < @gmail.com> Cc: Joerg Denzinger < @cpsc.ucalgary.ca> Dear Nick, I hereby grant you permission to use the pictures in the files EIA-Functional- Architecture.pdf and exceptionRules.pdf that I provided you with in your thesis. Yours Joerg Denzinger

Joerg Denzinger University of Calgary Email: @cpsc.ucalgary.ca Department of Computer Science Phone: (403) 2500 University Drive NW Fax: (403) Calgary, Alberta, Canada T2N 1N4 URL:

1