Beyond Clever Hans: Learning from People Without Their Really Trying

Beyond Clever Hans: Learning From People Without Their Really Trying by Vivek Veeriah A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Computing Science University of Alberta c Vivek Veeriah, 2017 Abstract Facial expressions and other body language are important for human communication. They complement speech and make the process of communication simple and sustainable. However, the process of communication using existing approaches to human-machine interaction is not intuitive as that of human communication. Specifically, the existing approaches to human machine interaction do not learn from whatever subtle non-verbal cues produced by a user. Many of the existing approaches map body language cues to instructions or rewards and use them to train a machine. These mappings are not learned from ongoing interactions and are assumed to be defined by some external source. As a consequence, the communicative process through these approaches can cause significant cognitive load on the user. This is an important problem that needs to be addressed if we are to amplify our existing cognitive and physical capabilities through intelligent machines. Towards addressing this, we introduce our idea that allows machines to learn from whatever subtle cues people produce during the process of interaction. Particularly, the agent learns a value function that maps the user's non-verbal cues to later rewards. By using this value function, the user can teach their agent to complete a task according to their preferences, and consequently maximize their satisfaction. We demon- strate this by training an agent using facial expressions. Furthermore, we show that these learned value functions can be successfully transferred across tasks. In conclusion, our approach is the first that allows people to teach their machines using whatever subtle cues they produce and could take us far in achieving sustainable forms of human-machine interaction. ii To Mom and Dad iii Acknowledgements I would like to thank my supervisors, Richard S. Sutton and Patrick M. Pi- larski, for being amazing mentors and for introducing me to this exciting field of reinforcement learning. In particular, I am thankful to Rich for showing me how to critically develop my own thoughts, and more importantly, for in- spiring me to pursue ambitious and important ideas. I would like to thank Patrick for his never-ending enthusiasm towards research and for his consistent encouragement. His excitement has always been contagious! Further, I would like to thank Pooria Joulani, Craig Sherstan, Rupam Mahmood, Roshan Shariff, Marlos Machado, Kenny Young, Tian Tian, Eric Graves, Gautham Vasan and Sanket Kumar Singh for all their help along the way. My experiences at University of Alberta and at the RLAI Lab wouldn't have been a joyful one without them. iv Table of Contents 1 Beyond Clever Hans 1 1.1 Horse that Answered Questions . 1 1.2 Human-Machine Interaction using Body Language . 2 1.3 Moving Beyond Clever Hans . 3 1.4 Outline . 5 1.5 Contributions . 5 2 Background 7 2.1 The Reinforcement Learning Problem . 7 2.2 Value Functions . 9 2.3 Temporal-Difference (TD) Learning . 11 2.3.1 Prediction Algorithm: TD(λ) Learning . 13 2.3.2 Control Algorithm: Sarsa(λ) algorithm . 14 2.4 Actor-Critic Methods . 15 2.5 Linear Function Approximation Using Tile Coding . 17 3 Existing Approaches to Human-Machine Interaction 19 3.1 Interactions in the Form of Rewards . 19 3.2 Interactions in the Form of Demonstrations and Instructions . 22 3.3 Relevance of these Related Works to this Thesis . 23 4 Learning From Prospective Body Language 24 4.1 Towards Natural Forms of Human-Machine Interaction . 24 4.2 Our Approach: Learning from Prospective Body Language Cues 26 4.3 Existing Approaches that Use Body Language . 28 4.3.1 Body Language as Instructions to Machines . 28 4.3.2 Body Language as Rewards to Machines . 30 4.4 Comparison to Existing Human-Machine Interaction Approaches 31 5 Using Sarsa To Learn From Prospective Body Language 33 5.1 Implementation using Sarsa . 34 5.1.1 State Representation: Processing Raw Images into Fa- cial Expression Features . 35 5.1.2 Sarsa Learning Algorithm . 36 5.2 Grip-Selection Task . 38 v 5.2.1 State Space . 39 5.2.2 Action Space . 39 5.2.3 User-Generated Rewards . 40 5.3 Experiments and Results . 40 5.3.1 Experiment 1: Multiple Grip and Object Setting . 41 5.3.2 Experiment 2: Infinite Object Setting . 43 5.4 Discussion . 45 6 Using Actor-Critic To Learn From Prospective Body Lan- guage 49 6.1 Implementation using an Actor-Critic Method . 50 6.2 Experiments and Results . 52 6.3 Discussion . 54 7 Discussions and Extensions: From Facial Expressions to Body Language 55 7.1 What is Missing in Existing Human- Machine Interaction Approaches? . 55 7.2 How Does Our Approach Differ from Approaches that Learn from User-Generated Rewards? . 57 7.3 Do People Produce Non-Verbal Cues While Interacting With Their Machines? . 58 7.4 How Does Our Approach Relate to Affective Computing? . 60 7.5 Better Feature Representations for Nuanced Body Language . 60 7.6 Integrating Different Modalities . 61 8 Conclusion 63 Bibliography 65 vi List of Figures 2.1 The Reinforcement Learning problem . 7 2.2 Actor-Critic Architecture . 16 2.3 Tile Coding . 18 3.1 Interactive Shaping . 20 4.1 Learning from Prospective Body Language . 26 5.1 State Representation for Face-Valuing Agent . 35 5.2 Grip-Selection Task . 38 5.3 Learning Performance of Sarsa Face-Valuing Agent . 42 5.4 Total User-Generated Rewards Provided to Sarsa Learning Agent 43 5.5 Total Time Steps Taken to Complete the Infinite Objects Set- ting by Sarsa Learning Agents. 44 6.1 Learning Performance of Actor-Critic Face-Valuing Agent . 52 vii Chapter 1 Beyond Clever Hans The overarching goal of this thesis is to introduce an approach to human- machine interaction, where the machine learns from whatever subtle non- verbal cues that are produced by the user during the process of interaction. We begin by introducing the story of Clever Hans, which is a popular example that illustrates the role of body language in communication. Subsequently, we broadly discuss how existing approaches use body language for human- machine interaction. Finally, we introduce our approach and highlight our contributions in this thesis. 1.1 Horse that Answered Questions Wilhelm von Osten was a math teacher and an amateur horse trainer. He owned a horse called Hans. In the year 1891, von Osten displayed his horse in public and claimed that he had taught his horse to perform simple arithmetic calculations, tell time, and keep track of the calendar. Hans answered many of the questions posed by the questioner, and over time, became a public sensation in the late twentieth century. As a result of this popularity, the horse was called \Clever Hans" and was reported by the New York Times in 1904. The significant public interest in Clever Hans motivated the German board of education to appoint a commission to investigate whether the horse pos- sessed intelligence. The commission recruited a German psychologist named Oskar Pfungst, who conducted a substantial number of trials with Hans and 1 discovered that Hans got the right answer only when the questioner knew what the answer was. More importantly, Hans needed to see the questioner while answering their questions. Pfungst then studied the behavior of the questioner while the horse was answering their question and reported his findings (c.f. Pfungst, 1911). He discovered that the horse's behavior was influenced by subtle and unintentional cues. Specifically, as Hans approached the correct answer the questioner's posture and facial expressions changed in ways that were consistent with an increase in tension, which was released when Hans made the final tap with its hoof. This change in body language served as a useful cue for the horse to stop tapping its hoof. Hans learned when to stop tapping its hoof by observing the subtle non- verbal cues of the questioner. This ability of the horse was wrongly attributed to be a consequence of possessing intelligence. While training his horse, von Osten rewarded the horse whenever it answered a question correctly which reinforced this process of answering from the questioner's subtle cues. 1.2 Human-Machine Interaction using Body Language The ideal approaches to human-machine interaction should allow users to com- municate naturally as they would do with their peers. This would make the process of interaction between people and machines to be sustainable and scalable. As an important step towards achieving this ambitious goal of ideal human-machine interaction, we need to design approaches that allow machines to learn from whatever non-verbal cues produced during the interaction, sim- ilar to how the horse picked on unintentional non-verbal cues of its questioner for answering their questions. In human communication, the subtle non-verbal cues that we produce un- intentionally influences the observer's behavior towards us. This is a natural process and is well-studied in psychology (De Gelder, 2006; Pezzulo et al., 2013; Jack & Schyns, 2015) and in affective neuroscience (Adolphs, 2002; Whalen et 2 al., 2013). Specifically, these body language cues have communicative value to their observers and allowing the machines to learn from our subtle non-verbal cues seems to be a natural direction to pursue in human-machine interaction. The existing approaches to human-machine interaction do not use body language as training information. Specifically, these approaches assume that the body language cues and their meaning are given to the machine from some external source, instead of learning them from ongoing interactions.

Load more