App ears in the Pro ceedings of the Fifteenth National Conference on Articial Intelligence AAAI pp

Learning to Predict User Op erations for Adaptive

Melinda T Gervasio and Wayne Iba and Pat Langley

Institute for the Study of Learning and Exp ertise

Staunton Court Palo Alto California

fgervasioibalangleygisleorg

Abstract of those abilities Machine learning can help address

these problems by providing adaptive systems that au

Mixedinitiative systems present the challenge of nd

tomatically tailor their b ehavior to dierent users

ing an eective level of interaction b etween humans

In this pap er we present an empirical study of learn

and computers Machine learning presents a promis

ing user mo dels in an adaptive scheduling assistant

ing approach to this problem in the form of systems

We b egin by describing our application a synthetic do

that automatically adapt their b ehavior to accommo

main that involves resp onding to hazardous materials

date dierent users In this pap er we present an em

incidents We also describ e Inca the interactive as

pirical study of learning user mo dels in an adaptive

sistant we are developing for this domain Traces of

assistant for crisis scheduling We describ e the prob

lem domain and the scheduling assistant then present

user interactions with Inca provided the data for our

an initial formulation of the adaptive assistants learn

learning exp eriments We formulate the scheduling as

ing and the results of a baseline study After this

sistants task of predicting user op erations as a classi

we rep ort the results of three subsequent exp eriments

cation problem and we discuss the results of a baseline

that investigate the eects of problem reformulation

study which showed some b enet from learning but

and representation augmentation The results suggest

also left ro om for improvement The three subsequent

that problem reformulation leads to signicantly b et

exp eriments investigate the eects of problem represen

ter accuracy without sacricing the usefulness of the

tation and formulation on p erformance Their results

learned b ehavior The studies also raise several inter

show that problem reformulation can lead to much b et

esting issues in adaptive assistance for scheduling

ter adaptation without sacricing the usefulness of the

learned concepts The exp eriments raised a numb er of

issues which we consider in our closing discussion of

Intro duction

related and future work

In recent years there has b een a surging interest in

the mixedinitiative paradigm where multiple agents

Scheduling for Crisis Resp onse

sp ecically humans and software systemsshare con

A dominant theme in crisis resp onse is urgencyan

trol by partitioning the resp onsibilities in problem solv

agent is comp elled to act to avert an undesirable sit

ing This trend holds not only for the ubiquitous appli

uation in a limited amount of time Finding an ecient

cations on the Web but also in domains such as plan

level of interaction b etween the human user and the

ning and scheduling where the traditional AI approach

computer is thus particularly imp ortant Our resp onse

has involved autonomous systems Ideally a mixed

to urgency relies on machine learning to acquire user

initiative system b enets from the individual strengths

mo dels to facilitate this interaction To illustrate our

of its constituentsthe exp ertise of the human user and

ideas and to lay the groundwork for the exp eriments in

the computational p ower of the computer For this syn

this pap er we will discuss them in the of a syn

ergy to o ccur however the division of resp onsibilities

thetic hazardous materials domain HazMat and the

must b e mutually b enecial and the two participants

INteractive Crisis Assistant Inca that we develop ed

must b e able to work together eectively Meeting these

for this domain Inca provides assistance for b oth plan

requirements with a single static system will b e dicult

ning and scheduling but we will fo cus on the scheduling

b ecause dierent users will have dierent abilities and

task here

preferences In addition as software systems b ecome

more p owerful they may outgrow the users ability to

HAZMAT Resp onse Using INCA

communicate tasks or requests that b est take advantage

In developing the synthetic HazMat domain we con

sulted the North American Emergency Resp onse

c

Copyright American Asso ciation for Articial In

Guideb o ok Transp ort Canada et al a hand

telligence wwwaaaiorg All rights reserved

b o ok for rst resp onders to hazardous materials inci

scription or overallo cation of any resource Using the dents A HazMat problem consists of a spill and p os

ve available op erators the user mo dies the sibly a re involving one of typ es of hazardous ma

until it is feasible and he considers it acceptable terial There are classes of HazMat incidents

varying in the typ e and amount of material involved

Inca currently assists the user in scheduling by pro

the lo cation of the incident and the characteristics of

viding an initial candidate schedule retrieved from a

the spill and any re Incidents also have asso ciated re

case library by suggesting heuristically determined de

and health hazards that resp ectively characterize the

fault values for resources durations and start times

probability of a re and the danger to p eoples health

and by checking the feasibility of By inte

grating learning into Inca we hop e to improve its as There are typ es of actions and typ es of re

sistant capabilities by letting it adapt its b ehavior to sources available for resp onding to a HazMat incident

In any given problem only a subset of the actions will

individual users

b e applicable as indicated in the Guideb o ok Each ac

tion addresses particular asp ects of a HazMat problem

Acquiring and Applying User Mo dels

and requires some minimum set of resources The sp e

A user will have p ersonal b eliefs ab out what actions

cic resources available and their asso ciated capacity

are appropriate for a problem what actions are more

and quantity constraints vary with every problem

imp ortant than others and what actions should b e p er

Inca is an interactive system that provides planning

formed rst These preferences are reected by the op

and scheduling assistance for HazMat resp onse In

erators that the user selects and their resultswhat

the planning phase the user interacts with Inca to

actions are included and how they are ordered in the

cho ose the schedulable actions a subset of the appli

schedule how resources are allo cated to dierent ac

cable actions to the problem The input to the schedul

tions and how dierent conicts are resolved The goal

ing phase is this set of schedulable actions the set of

of learning in Inca is to extract such information from

available resources and an initial p ossibly empty can

traces of its interactions with the user and to use this

didate schedule provided by Inca

information to adapt its b ehavior accordingly For ex

In a departure from traditional scheduling the task is

ample Inca might use this preference information to

to cho ose some subset of the schedulable actions and to

suggest resources durations and start times that are

assign them to resources Moreover actions may b e al

similar to the users previous choices when adding the

lo cated a variable numb er of resources and arbitrary du

action Or it might apply scheduling op erators based on

ration Consider the action of extinguishing a re with

this information to sp ecically tailor initial candidate

a hose Allo cating more resources reghters hoses

schedules By making suggestions that the user is more

hydrants etc to the action as well as simultaneously

likely to accept Inca can make the HazMat resp onse

scheduling other extinguishment actions eg extin

pro cess more ecient thereby directly addressing the

guish with dry sand will put out the re more quickly

urgency asp ect of crisis resp onse

This interdep endence among actions resources and ef

As our rst step in integrating learning into Inca

fects makes it dicult to completely determine resource

we fo cused on the prediction of scheduling op erations

requirements and action durations prior to scheduling

Sp ecically Incas user mo deling task was Given a

With eective heuristics b eing dicult to engineer for

particular scheduling stateas characterized by the

the resulting underconstrained problem machine learn

problem available resources schedulable actions and

ing b ecomes even more attractive

current schedulepredict the users next scheduling op

Scheduling an action involves four decisions the

eration This can b e translated into a standard classi

numb er of resources to allo cate to the action the sp e

cation task with the class b eing the scheduling op er

cic resources to allo cate the start time and the dura

ation and the instance b eing the scheduling state for

tion Scheduling in Inca takes place in a repair space

which the prediction is b eing made

where the op erators include adding and removing ac

tions from the schedule as well as mo difying parame

Baseline Study Eects of Learning

ters of scheduled actions Sp ecically the user interacts

We extracted the data for our learning exp eriments with Inca through a graphical interface that provides

from the individual traces of two users each interacting the user with ve scheduling op erators add a new ac

with Inca over HazMat problems Each schedul

tion remove an action shift the start time of an action

ing op eration p erformed while solving a problem corre change the duration of an action and switch an action

sp onds to a training or test instance From traces of the from one resource to another

rst and second users we extracted examples data

The nal schedule must b e feasiblethat is it must

set A and examples data set B resp ectively

not violate any capacity or quantity constraints A re

Nearly all elded applications of machine learning source is oversubscribed if there is any time at which

Langley Simon rely on attributevalue repre the numb er of simultaneously scheduled actions on that

sentations and standard sup ervised induction metho ds resource exceeds its capacity Similarly a resource is

so we decided to investigate the feasibility of such an overal located if the actions scheduled on it consume

approach here This has the advantage of either avoid more than the available resource quantity An infeasibly

ing the cost of engineering a sp ecialpurp ose approach scheduled action is one that participates in the oversub

Table Percentage accuracy in predicting user op era Table Percentage accuracy in predicting user op era

tions during scheduling tions after various problem reformulations

A B A B

test on same user :  : :  : temp oral attributes :  : :  :

test on other user :  : :  : abstracted :  : :  :

alternative classes :  : :  :

no il legal predictions :  : :  :

or justifying the need for more complex representations

and algorithms

For the baseline study we used attributes to de

testing We then ran another ten trials this time us

scrib e the scheduling state We represented the problem

ing the entire other users data set for testing Table

using nominal attributes directly corresp onding to

shows the results for each data set averaged over the

the material spill re and hazard features of the Haz

ten runs for each condition condence intervals

Mat problem We represented the resources with

The results suppp ort b oth our hyp otheses under each

Bo olean features one for each resource typ e with the

condition the resulting accuracies were notably b et

value denoting whether there is at least one resource of

ter than guessing randomly and guessing the

that typ e available Finally we represented the schedu

most frequent class with the p erformance from

lable set of actions and the schedule with attributes

training and testing on data from the same user b eing

corresp onding to the p ossible actions These at

signicantly b etter than that from training on one user

tributes had four p ossible values not schedulable un

and testing on another paired t test p

scheduled feasibly scheduled or infeasibly scheduled

There are several levels at which the prediction task

Improving Accuracy Through

can b e formulated At the highest level is the predic

Problem Reformulation

tion of a general scheduling op erator such as ADD any

action At the lowest level is the prediction of a sp e

The baseline study showed that learning improved p er

cic instantiation of a scheduling op erator such as ADD

formance but even in the b est case for data

the absorbwithdrysand action on crew members

set B withinuser test there were more than twice as

and and dry sand starting at time for

many misclassications as correct classications This

a duration of time units For the baseline study

calls into question the utility of the predictions in the

we chose an intermediate level requiring the sp ecica

context of a scheduling assistant Our analysis of the

tion of a particular action but not sp ecic resources or

results suggested our formulation of the problem as a

amounts We also combined the remove action change

prime susp ect for this p o or p erformance We identied

duration shift action and switch resource op erators

three ways to reformulate the problem adding contex

into a single REPAIR op erator This resulted in ADD

tual temp oral information abstracting the class lab els

and REPAIR op erations for each of the actions for

and mo difying the prediction task We now describ e

a total of dierent classes

the three exp erimental studies we conducted to test

We can thus state the learning task as Given a set

whether these problem reformulations would increase

of training examples learn a classier that makes cor

predictive accuracy We also prop ose a mo dication to

rect predictions on new examples Preliminary studies

the prediction element that promises substantial p er

with sup ervised learning algorithms from the rep ository

formance improvement

maintained by the Machine Learning Group at the Uni

versity of Texas at Austin revealed ID Quinlan

Adding Temp oral Information

to b e the most promising so we chose to use this in our

In the rst exp eriment we wanted to determine the

1

formal exp eriments

eects of augmenting the problem representation with

We had two hyp otheses for the baseline exp eriment

information ab out the other op erators used in solving

The rst was that learning would improve accuracy and

the problem Our statement of the p erformance task

the second was that greater gains would result if we

requires Inca to predict the users immediate next op

trained and tested on data from the same user than if we

eration This is likely to dep end on the other op erators

trained on data from one user and tested on data from

the user selects within the same problem particularly

another The aim of learning is to mo del a sp ecic user

the most recent op erators selected but the base prob

so we exp ected some b enet from learning on another

lem representation included no such explicit contextual

user but not as much as learning from the same user

information

We tested these hyp otheses by rst running ten tri

Our hyp othesis in this exp eriment was that adding

als on each users data set with each trial using

contextual temp oral information would improve accu

randomly chosen examples for training and the rest for

racy To test this hyp othesis we extended the instance

1

representation with ve attributes to represent the ve

The repository resides at wwwcsutexaseduusersml

most recent op erations the user p erformed prior to the We obtained similar results with the C system 100 A:base

B:base

able Total p ercentage of prediction errors and p er T A:abstracted

B:abstracted

tage due to illegal predictions cen A:alternative classes B:alternative classes 80 A: no illegal predictions

B: no illegal predictions

A B

illegal total illegal

total 60

base condition

temp oral attributes

40

Percentage Accuracy

2

op eration asso ciated with that instance We then ran

20

ten trials on each data set using the same training and

test splits as in the baseline study

able and Figure present the results con T 0 0 100 200 300 400 500 600 700 800

Training Examples

dence intervals from the problem reformulation ex

p eriments The results from the rst exp eriment Ta

ble temp oral attributes did not supp ort our hy

Figure Learning curves for the various exp erimen

p othesis temp oral information did not aect accuracy

tal conditions showing improvement due to successive

One explanation lies in the numb er of errors that in

problem reformulations

volved predicting an illegal op eration Incorrect predic

tions may b e divided into two typ es legal predictions

who must only cho ose from a small set of corresp onding

that corresp ond to op erations that can b e p erformed

more sp ecic actions Abstracting the problem in this

in the scheduling state and il legal op erations that can

manner reduced the original classes into abstract

not Sp ecically an ADDaction op eration is illegal

classes and also increased the prop ortion of the most

if the action is already in the schedule or it is not in

frequent class from to We ran ten trials

the schedulable set a REPAIRaction op eration is ille

on each data set using the same training and test splits

gal if the action is not in the schedule Table shows

as in the baseline study

that the p ercentage of examples that were misclassied

The results supp ort our hyp othesis accuracy in

into illegal op erations increased with the intro duction

creased signicantly paired t test p for b oth

of the temp oral attributes to account for over half of

data sets with class abstraction Table and Figure

all the misclassications These results reveal that the

abstracted This result may not b e surprising since the

problem representation is not suciently balancing the

abstraction results in a simpler prediction task How

necessary contextual and legality information to let the

ever it raises the issue of determining an appropriate

system make go o d legal predictions They also sug

task level in mixedinitiative settings such as interactive

gest additional analysis and exp erimentation on prob

scheduling Formulating the prediction task at higher

lem representation to shed light on this issue

levels transfers more decisionmaking resp onsibility to

the human user but this is not undesirable if it leads

Class Abstraction

to more eective interaction overall

In the second exp eriment we wanted to investigate the

utility of simplifying the problem via class abstraction

Redening Correctness

As stated earlier the prediction task can b e formulated

In the third exp eriment we wanted to investigate the

at dierent levels and for the baseline study we chose

eects of mo difying the prediction task by allowing al

an intermediate level requiring the sp ecication of an

ternative class lab els The formulation of the learning

ADD or REPAIR on a particular action but not the

problem as the problem of predicting the users next

sp ecication of particular resources or amounts

op eration was a natural t to the data provided by the

Our hyp othesis in the second exp eriment was that

user traces However it is an unnecessarily dicult

p erformance would improve on a more abstract predic

learning task in that it requires the system to predict

tion task To test this hyp othesis we chose to abstract

a users immediate next action A user may arrive at

one level to require predicting an ADDREPAIR op

the same schedule through dierent sequences of op er

eration on a class of actions rather than on individual

ations many of which may b e equally acceptable to the

actions For example the abstract ADD extinguish

user Thus a more appropriate learning task might b e

with hose replaces the set of sp ecic ADDs on ac

to predict any one of the users subsequent op erations

tions such as extinguish with hose using water from

sub ject to legality given the current state

a hydrant and extinguish with hose using foam and

Our hyp othesis in this exp eriment was that accuracy

a pump er We felt this was a reasonable abstraction

would improve under the redened correctness crite

in that it makes few additional demands on the user

rion To test this hyp othesis we rst mo died the

2

original examples to include a set of alternative classes

We tried adding one to ve previous op erations and

consisting of all the subsequent op erations the user in found no signicant dierences

voked to solve the HazMat incident minus those that Some recent AI scheduling systems eg Smith et

were illegal in the current state We ran ten trials on al Fukunaga et al have mixedinitiative

each data set again using the same training and test mo des but few incorp orate learning or adaptation

splits Training pro ceeded as b efore but during test to dierent users The same holds for AI crisis re

ing we judged a prediction to b e correct if it matched sp onse systems such as OPlan Tate et al

the instances lab el or one of its alternative classes and SOCAP Bienkowski An exception is CAB

INS Miyashita Sycara an assistant for job

The results supp ort our hyp othesis signicantly

shop scheduling that learns user preferences on repair

greater accuracy paired t test p was achieved

heuristics Like Inca CABINS acquires preferences

on the reformulated task Table and Figure alter

from user traces and uses these to direct a repairspace

native classes Again the results might seem obvious

search for a solution CABINS diers in that it invokes

since allowing alternative classes simplies the predic

a heuristic scheduler to generate an initial schedule and

tion task However as stated earlier this may in fact b e

a casebased metho d to learn user preferences whereas

a more appropriate task in that it may b etter capture

Inca uses a casebased metho d to retrieve an initial

what concerns users in the scheduling pro cess

schedule and other learning techniques to acquire user

preferences We b elieve this approach provides a b etter

Correcting for Illegal Op erations

t to crisis domains where case libraries can b e built

The observation that many errors involved illegal op er

from standard op erating pro cedures or the results of

ations Table suggests another change to the predic

planning exercises

tion element The revised system would check to de

Much work on p ersonalization particularly for com

termine if a given prediction is illegal and if so would

puteraided instruction has fo cused on metho ds for

move on to the next most likely prediction continu

constructing mo dels of user b ehaviors that can then b e

ing in this manner until it predicts a legal op eration

used to classify users and mo dify system interaction ac

Implementing this scheme is straightforward for some

cordingly eg Sleeman Smith Clancey

classiers like naive Bayes and nearest neighb or which

Anderson Reiser Langley uses the

rank their predictions but it is more complicated for

term adaptive user interfaces to refer to systems like

decision trees

Inca that instead use machine learning to construct

We are still in the pro cess of considering various ap

user mo dels from interaction traces Previous work in

proaches to implementing this ltering scheme How

this area include Schlimmer and Hermens inter

ever we can estimate the new accuracy by factoring

face for rep etitive form lling Dent et als CAP

out the existing illegal predictions Let us assume that

a calendar apprentice for scheduling app ointments and

the learned predictors accuracy will b e the same on the

Pazzani et als Syskill Webert a Web

illegal classications as on the legal ones Let a b e the

page recommendation service Like Inca these sys

accuracy with abstraction s b e the gain from alterna

tems use their learned user mo dels to tailor their b e

tive classes and i b e the misclassications that were

havior to individual users The work describ ed in this

illegal The accuracy under this new scheme would b e

pap er diers in its fo cus on an explicit evaluation of the

a+s

a s i This results in accuracies in the to

100

factors aecting success

range Table and Figure no illegal predictions

A priority for future work involves fully integrating

which are much more promising than the baseline re

the learned predictors into Inca which would let us di

3

sults

rectly evaluate our hyp othesis that learning user prefer

ences will pro duce more rapid HazMat resp onse This

Related and Future Work

will also let us test our hyp otheses ab out appropriate

task formulations Initially we plan to use an oine

Previous work on learning for scheduling has fo cused

setting in which we train Inca on user traces incorp o

on autonomous systems with the aim of improving

rate the learned mo del into the system and then let the

scheduling eciency or schedule quality For example

same user interact with Inca while measuring p erfor

Gratch and Chien increased eciency in a deep

mance Eventually we plan to integrate learning in an

space by acquiring eective domain

online setting where the system up dates its user mo del

sp ecic heuristics Similarly Eskey and Zweb en

during the course of its interactions In such settings it

increased scheduling eciency in payload pro cessing for

b ecomes particularly imp ortant that the system avoid

the NASA space shuttle by acquiring rules to avoid

making suggestions that are so unacceptable and dis

chronic Zhang and Dietterich

tracting that the user would have b een b etter o with

used reinforcement learning on the same task

out assistance One resp onse which we plan to explore

but to acquire heuristics that resulted in shorter sched

in future work is to incorp orate condence measures

ules Our framework for learning diers in its emphasis

into system predictions to prevent it from making sug

on acquiring usersp ecic p erformance criteria and as a

gestions until they exceed some minimum threshold

result in its reliance on detailed traces of user decisions

Another imp ortant area for future work is task re

3

formulation The results on abstraction suggest that

For naive Bayes we have obtained actual results that

we lo ok for simple learning problems that would b en were as go o d or b etter than these estimates

Clancey W J Dialogue Management for Rule et from standard induction algorithms One ap

Based Tutorials In Proceedings of the Sixth Interna proach would b e to divide the prediction task into a

tional Joint Conference on Articial Intel ligence set of smaller tasksfor example predicting sp ecic re

sources durations and start times once an action has

Dent L Boticario J McDermott J Mitchell T

already b een selected for scheduling or pro ceeding with

and Zab orowski D A Personal Learning Ap

prediction hierarchically through more and more sp e

prentice In Proceedings of the Tenth National Confer

cic op erations The results on alternative classes sug

ence on Articial Intel ligence

gest that we reevaluate our task denition in light of the

Eskey M and Zweb en M Learning Search

kinds of suggestions ie system predictions the user

Control for ConstraintBased Scheduling In Proceed

will nd helpful We are currently exploring alternative

ings of the Eighth National Conference on Articial

formulations of the learning task along these lines We

Intel ligence

also exp ect to see greater b enets from combinations of

Fukunaga A Rabideau G Chien S and Yan

successful reformulations

D Toward an Application Framework for Au

We also need to revisit our representational options

tomated Planning and Scheduling In Proceedings of

The results on adding temp oral attributes indicate that

the International Symposium on Articial Intel

there may b e imp ortant information that is currently

ligence Robotics and for Space Tokyo

not b eing captured The attributevalue scheme is sim

Japan

ple but it do es not let us easily represent sp ecic sched

Gratch J and Chien S Learning Eective

ules ie what actions use what resources over what

Control Strategies for DeepSpace Network Schedul

time intervals or the relationship b etween what ac

ing In Proceedings of the Tenth International Confer

tions address what problem features This information

ence on Machine Learning

which may play a role in user preferences can b e more

easily captured in a relational representation which we

Langley P and Simon H A Applications of

plan to investigate in future versions of Inca

Machine Learning and Rule Induction Communica

tions of the ACM

Concluding Remarks

Langley P Machine Learning for Adaptive User

In this pap er we reviewed Inca an adaptive assistant

Interfaces In Proceedings of the TwentyFirst German

for crisis scheduling and presented empirical studies

Annual Conference on Articial Intel ligence

of learning user mo dels aimed at improving the sys

Miyashita K and Sycara K CABINS A

tems b ehavior We formulated the task of the schedul

Framework of Knowledge Acquisition and Iterative

ing assistant as predicting the users action in resp onse

Revision for Schedule Improvement and Reactive Re

to a particular problem set of resources and current

pair Articial Intel ligence

schedule We set out to explore the limits of simple

Pazzani M Muramatsu J and Billsus D

problem representations and established learning algo

Syskill Webert Identifying Interesting Web

rithms and our initial study revealed some b enets

Sites In Proceedings of the Thirteenth National Con

from learning but also left ro om for improvement

ference on Articial Intel ligence

Subsequent exp eriments demonstrated that abstrac

Quinlan R Induction of Decision Trees Ma

tions of the learning task as well as reformulations of

chine Learning

the prediction task result in substantially b etter p er

formance without sacricing and p ossibly even improv

Sleeman D H and Smith M J Mo delling

ing the usefulness of the learned b ehavior These nd

Pupils Problem Solving Articial Intel ligence

ings suggest that adaptive interfaces like Inca which

learn mo dels of user preferences constitute a promising

Smith S Lassila O and Becker M Con

approach to mixedinitiative scheduling that deserves

gurable MixedInitiative Systems for Planning and

fuller exploration in future research

Scheduling In Advanced Planning Technology Tate

A ed Menlo Park AAAI Press

Acknowledgments This research was supp orted by

Tate A Drabble B and Kirby R B O

the Oce of Naval Research under Grant N

Plan an Op en Architecture for Command Planning

We would also like to thank Mark Malo of for

and Control In Intel ligent Scheduling Zweb en M and

his helpful comments on the pap er and the Organiza

Fox M S eds San Francisco Morgan Kaufmann

tional Dynamics Center group at Stanford University

for many interesting discussions on crisis resp onse

Transp ort Canada the US Department of Trans

p ortation and the Secretariat of Communications

References

and Transp ortation of Mexico North American

Emergency Response Guidebook

Anderson J R and Reiser B J The LISP Tu

tor Byte

Zhang W and Dietterich T A Reinforcement

Learning Approach to Jobshop Scheduling In Pro Bienkowski M SOCAP System for Op erations

ceedings of the Fourteenth International Conference on Crisis Action Planning In Advanced Planning Tech

Articial Intel ligence nology Tate A ed Menlo Park AAAI Press