Making and Keeping Probabilistic Commitments for Trustworthy Multiagent Coordination
Total Page:16
File Type:pdf, Size:1020Kb
Making and Keeping Probabilistic Commitments for Trustworthy Multiagent Coordination by Qi Zhang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2020 Doctoral Committee: Professor Satinder Singh Baveja, Co-Chair Professor Edmund H. Durfee, Co-Chair Professor Richard L. Lewis Assistant Professor Arunesh Sinha Qi Zhang [email protected] ORCID iD: 0000-0002-8562-5987 c Qi Zhang 2020 ACKNOWLEDGEMENTS First and foremost I would like to give my sincerest and warmest thanks to my co-advisors, Edmund Durfee and Satinder Singh, for their wisdom and patience in training me to become a researcher. They have also been extraordinarily supportive of my academic career after PhD. I was extremely lucky to be your student. I would like to give thanks to Rick Lewis, who served on my doctoral committee, for advising me on an early project that did not end up in this thesis but I am particularly proud of. I would like to thank Arunesh Sinha, who also served on my committee, for giving thoughtful feedback in my dissertation defense and writing. I would like to express my gratitude to Nan Jiang, Xiaoxiao Guo, Junhyuk Oh, Shun Zhang, Aditya Modi, Janarthanan Rajendran, Vivek Veeriah, John Holler, Max Smith, Zeyu Zheng, Ethan Brooks, Chris Grimm, Wilka Carvalho, Risto Vuorio, and other members in the RL lab, with whom I spent a wonderful time as a PhD student. In particular, I am thankful to Xiaoxiao who introduced me to IBM Research, where Murray Campbell hosted me as an intern. I truly enjoyed discussions and conversa- tions with Murray, Gerald Tesauro, Miao Liu, Ashwin Kalyan, and fellow interns at IBM. I would like to give my heartiest thanks to my parents, my twin brother, and my grandparents for their love and support. Finally, I would like to thank Xiaoyu, my wife, for everything you have brought to me. As I am writing down these words, we are celebrating your birthday. Happy birthday to my little sweetheart. ii TABLE OF CONTENTS ACKNOWLEDGEMENTS :::::::::::::::::::::::::: ii LIST OF FIGURES ::::::::::::::::::::::::::::::: v LIST OF TABLES :::::::::::::::::::::::::::::::: vii ABSTRACT ::::::::::::::::::::::::::::::::::: viii CHAPTER I. Introduction ..............................1 1.1 Problems and Contributions...................2 1.2 Thesis Structure.........................7 II. Background ...............................8 2.1 Markov Decision Processes...................8 2.1.1 MDP Planning.....................9 2.2 Bayesian Model Uncertainty................... 10 2.3 Probabilistic Commitments................... 11 2.3.1 Provider-Recipient Decision-Making Model..... 11 2.3.2 Predictive Commitment Semantics.......... 12 2.3.3 Commitment-based Multiagent Coordination.... 12 III. Problem Formulation ......................... 16 3.1 The Provider's Adherence under Model Uncertainty..... 17 3.2 The Recipient's Robust Interpretation............. 22 3.3 Efficient Formulation of Cooperative Commitments...... 25 IV. Trustworthy Adherence to Probabilistic Commitments .... 28 4.1 Problem Statement Recapitulation............... 29 4.2 Methods............................. 30 iii 4.3 Empirical Study......................... 48 4.4 Summary............................. 61 V. Robust Interpretation of Probabilistic Commitments ..... 62 5.1 Problem Statement Recapitulation............... 62 5.2 Achievement and Maintenance................. 64 5.3 Bounding the Suboptimality.................. 66 5.3.1 Minimal Enablement Duration............ 68 5.3.2 Alternative Influence Approximations........ 73 5.4 Empirical Study......................... 77 5.4.1 Suboptimality for General Commitments...... 77 5.4.2 Suboptimality for Value Maximizer Commitments. 80 5.5 Summary............................. 85 VI. Efficient Formulation of Cooperative Probabilistic Commit- ments ................................... 86 6.1 Cooperative Probabilistic Commitments............ 86 6.2 Structure of the Probabilistic Commitment Space....... 87 6.2.1 Properties of the Provider's Commitment Value.. 88 6.2.2 Properties of the Recipient's Commitment Value.. 90 6.3 Centralized Formulation of Cooperative Commitments.... 92 6.3.1 Empirical Evaluation................. 93 6.4 Querying Approach for Decentralized Formulation of Cooper- ative Commitments....................... 98 6.4.1 Structure of the Commitment Query Space..... 102 6.5 Efficient Commitment Query Formulation........... 104 6.5.1 Empirical Evaluation................. 104 6.6 Summary............................. 111 VII. Conclusion ............................... 113 7.1 Discussion of Future Work................... 115 7.2 Closing Remarks......................... 118 BIBLIOGRAPHY :::::::::::::::::::::::::::::::: 119 iv LIST OF FIGURES Figure 2.1 The linear program for MDP planning................. 10 3.1 Illustration of the provider's commitment-constrained policy opti- mization problem............................. 18 3.2 The linear program for the provider's planning............ 19 4.1 CCFL program.............................. 32 4.2 CCNL program.............................. 34 4.3 Illustration of CCL............................ 35 4.4 CCL program............................... 37 4.5 The example that verifies Observation IV.1.............. 40 4.6 Illustration of CCIL........................... 43 4.7 CCL program in the reward uncertainty only case........... 44 4.8 Example as a proof of Theorem IV.5.................. 47 4.9 Windy L-Maze.............................. 49 4.10 Food-or-Fire............................... 53 4.11 Expected cumulative reward in Food-or-Fire domain as a function of the commitment and the belief-update lookahead boundary..... 54 4.12 RockSample instances.......................... 55 4.13 Results of CCL on Change Detection.................. 59 5.1 Candidate influences for an achievement commitment and a mainte- nance commitment............................ 66 5.2 Minimal enablement duration for an achievement commitment and a maintenance commitment........................ 69 5.3 1D Walk.................................. 71 5.4 Suboptimality in 1D Walk........................ 79 6.1 Centralized commitment formulation comparing the even, the DP, and the breakpoints discretizations................... 95 6.2 Visualizations of commitment value functions............. 97 6.3 Maximum feasible commitment probability............... 98 6.4 Centralized commitment formulation with the provider's action a+. 99 6.5 Profiled runtime and discretization density............... 100 6.6 Illustration of the querying approach.................. 101 6.7 Decentralized commitment formulation comparing the even, the DP, and the breakpoints discretizations................... 107 v 6.8 Comparison of the optimal, the greedy, and the random commitment queries................................... 109 6.9 EUS of the greedy query for the uniform, random, and Gaussian priors.110 6.10 EUS of the greedy query in the second round of querying....... 112 vi LIST OF TABLES Table 4.1 Evaluation of Non-Prescriptive Semantics, Prescriptive Non-Probabilistic Semantics, and Prescriptive Probabilistic Commitment on the Windy L-maze domain.............................. 51 4.2 Results on RockSample(2,2)....................... 56 4.3 Results on RockSample(4,4)....................... 57 4.4 Results on Change Detection...................... 60 5.1 1D Walk Examples for Theorem V.3................. 76 5.2 Suboptimality for maximizer commitments (without action a+ for the provider)............................... 82 5.3 Suboptimality for maximizer commitments (p+ = 0)......... 82 5.4 Suboptimality for maximizer commitments (p+ = 0:5)........ 83 5.5 Suboptimality for maximizer commitments (p+ = 0:9)........ 83 6.1 Averaged discretization size per commitment time........... 96 vii ABSTRACT In a large number of real world domains, such as the control of autonomous vehicles, team sports, medical diagnosis and treatment, and many others, multiple autonomous agents need to take actions based on local observations, and are in- terdependent in the sense that they rely on each other to accomplish tasks. Thus, achieving desired outcomes in these domains requires interagent coordination. The form of coordination this thesis focuses on is commitments, where an agent, referred to as the commitment provider, specifies guarantees about its behavior to another, referred to as the commitment recipient, so that the recipient can plan and execute accordingly without taking into account the details of the provider's behavior. This thesis grounds the concept of commitments into decision-theoretic settings where the provider's guarantees might have to be probabilistic when its actions have stochas- tic outcomes and it expects to reduce its uncertainty about the environment during execution. More concretely, this thesis presents a set of contributions that address three core issues for commitment-based coordination: probabilistic commitment adherence, in- terpretation, and formulation. The first contribution is a principled semantics for the provider to exercise maximal autonomy that responds to evolving knowledge about the environment without violating its probabilistic commitment, along with a family of algorithms for the provider to construct policies that provably respect the semantics and make explicit tradeoffs between computation cost and plan quality. The second contribution consists of theoretical analyses and empirical studies that improve our understanding of the recipient's