Using Bayesian Inference to Learn High-Level Tasks from a Human Teacher

Using Bayesian Inference to Learn High-Level Tasks from a Human Teacher Mark P. Woodward and Robert J. Wood Abstract— Humans can learn from teachers by observing ence. We decompose these works into two groups based on gestures, reinforcements, and words, which we collectively call the type of problem addressed: control vs. communication. signals. Often a new signal will support a different interpreta- In a “control problem”, the robot needs a mapping from tion of an earlier signal, resulting in a different belief about the task being learned. If robots are to learn from these signals, sensor values to actuator controls. This mapping can be they must perform similar inferences. We propose the use of difficult to specify. Often a human can directly control the Bayesian inference to allow robots to learn tasks from human robot to perform the behavior, even though they are unable teachers. We review Bayesian inference, describe its application to write down the function they are using. To make use of to the problem of learning high-level tasks from a human the human’s ability, a log is captured of the sensor values teacher, and work through a specific implementation on a robot. Bayesian inference is shown to quickly converge to the correct and control inputs as the human controls the robot. This task specification. log is then used to “learn” the mapping. This is often called imitation learning or apprenticeship learning. Some I. INTRODUCTION examples of control problems that have been addressed in this way are: performing helicopter aerobatics [4], navigating The ability of robots to perform meaningful work in a corridor [5], [6], [7], and pushing obstacles [8]. This type complex, real-world environments is continually expand- of learning could be used to create the primitive actions that ing [1], [2], [3]. To take advantage of these abilities, robot are assumed in this paper. users will need a mechanism for defining their own tasks. In what we are calling a “communication problem”, the Many of these users will not be proficient programmers, but human can write down exactly what they want the robot will be familiar with teaching other humans through ges- to do. More specifically, the human can write down the tures, reinforcements, and words, which we collectively call sequence of primitive actions that make up a task. The “signals”. It would be useful if they could teach robots using problem is communicating this sequence to the robot. While these same signals. The problem is that a signal can often a programmer could easily “code” this in, for a non-technical have multiple interpretations, either because of perception user, this is a communication problem where the robot must errors or because the gesture, reinforcement, or word by extract the specification of a task from human signals. itself does not carry enough information to fully specify Most of the work addressing this communication problem the task. These multiple interpretations result in multiple treats the teacher signals deterministically. For example, task specifications. Fortunately, as more signals arrive, fewer when the human gives a signal, the primitive action currently interpretations “make sense” and the task becomes clearer. executing gets appended to the primitive action sequence We propose the use of Bayesian inference to give robots this making up the task [9], [10], [11], [12]. But, as we type of reasoning, allowing users to specify new tasks using mentioned before, signals can often be interpreted in multiple familiar signals. ways, and it is only after the information from several signals We assume that the robot has been pre-programmed with is integrated that the meaning of early signals becomes clear. a set of primitive actions. The goal is to learn a composition Because of the continually accumulating signal information, of these primitive actions, which we call a “task” or “high- inference is needed. level task” to emphasize the use of primitive actions. We are To our knowledge, no one has applied Bayesian inference concerned with the teaching of the task, not the commanding to this problem. The following works maintain counts for of the task, which may also make use of Bayesian inference. primitive actions and then use these counts to form the task, In this paper we review Bayesian inference, describe its but no formal inference is performed [13], [14], [15]. application to task learning, and work through an example In [16], reinforcement learning, specifically Q- of an actual robot learning a simple task. But first we start Learning [17], is applied to this problem. The authors by reviewing some related work. identify the tendency of humans to “shape” when they are II. RELATED WORK teaching tasks, and encourage future work to incorporate human shaping into reinforcement learning algorithms. Multiple researchers have addressed the topic of learning Their suggestion is to add another button for the teacher from a human teacher and many make use of Bayesian infer- to indicate when they are “shaping”, thereby removing the uncertainty. This solution will work for the reinforcement Microrobotics Laboratory, School of Engineering and Applied Sciences, Harvard University. e-mail:mwoodward, learning case, where a reinforcement is the only signal, [email protected] but does not scale well to multiple signal types. This type t−1 t−1 of modification to the teacher signal is unnecessary with p(HtjH^ ;O ) is called the “motion model” and speci- Bayesian inference, as we show in our demonstration below. fies the motion of the time varying, hidden random variable A technique which bears similarity to our research, but H. In the context of an update to the posterior distribution, addresses a different problem, is Bayesian robot program- p(H^ t−1jOt−1), which is the posterior distribution from the ming (BRP) [8], in which the authors use probability to previous time step, is also called the prior distribution. We address traditional programming. Instead of defining the try to make it clear when we mean the prior distribution from preconditions for switching from one behavior to another, the previous time step, or the prior distribution at time zero, under BRP the programmer specifies what the robot is likely p(H0). to see when it is executing a behavior. Bayes rule is then used Given a set of measurements Ot and assuming H to express the distribution over which behavior should be run is discrete, one way to compute p(H^ tjOt) for a spe- in terms of what is likely to be seen for each behavior. This cific assignment to the (Ht;Ht−1; ··· ;H1;H0) is to start is not the use of Bayesian inference we propose in this paper, from p(H0) and recursively apply Eq. (4). This process i.e. inferring task specifics from a human teacher. can be done for each of the N t+1 assignments to the Also, in our demonstration below, we use a teacher applied (Ht;Ht−1; ··· ;H1;H0), where N is the number of values reinforcement as a signal. Many works have used teacher that H can take on. The complete set of N t+1 assignments applied reinforcements [16], [14], [15], [10], [11], etc., is called the posterior space. We will discuss techniques for but none with Bayesian inference. dealing with the exponential growth of the posterior space in the “Discussion” section bellow. III. BAYESIAN INFERENCE In general, Bayesian inference can be applied to problems Bayesian inference is a technique for estimating unobserv- with any number of discrete or continuous, time vary- able quantities from observable quantities. In this section we ing or static, hidden and observed, random variables, e.g. give an overview of Bayesian inference, beginning with some p(Ht; I; J; : : : jOt; P; Qt;:::). As before, the application of definitions. Bayes rule followed by the definition of conditional proba- We use p(·) as notation for both probability density bility can be used to decompose this posterior. In general, functions and probability mass functions. p(X) is shorthand in order to use Bayesian inference the following must be for p(X = x) where x is some event in the domain of the specified: random variable X. If X is a time varying random variable, Necessary Components for Bayesian Inference we use Xt to indicate the value of X at time t and we define t the shorthand X to mean (Xt;Xt−1; ··· ;X1). Finally, we 1) Define the hidden random variables. t t define X^ to be X with X0 added on. 2) Define the observable random variables. In its simplest form, Bayesian inference is just Bayes rule. 3) Specify a measurement model for each observable Bayes rule allows you to update your belief about a hidden random variable. quantity H given an observed quantity O, and is defined as, 4) Specify a motion model for each time varying hidden random variable. p(HjO) = p(OjH) × p(H)=p(O) (1) 5) Specify the prior distribution over the hidden random / p(OjH) × p(H) (2) variables at time zero. Next we frame high-level task learning as a Bayesian p(OjH) specifies the probability of measuring o given H = inference problem. h. It is called the “measurement model” and is generally easier to specify than p(HjO), which is why Bayes rule IV. BAYESIAN INFERENCE APPLIED TO is useful. p(H) is called the prior distribution over H and LEARNING HIGH-LEVEL TASKS represents the belief about H before O is measured. The The problem of learning high-level tasks from a human second line follows from the first and the fact that p(O) is teacher involves inferring task details from signals emitted a constant since we know the value of O.

Using Bayesian Inference to Learn High-Level Tasks from a Human Teacher

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support