SPOTTER: Extending Symbolic Planning Operators Through Targeted Reinforcement Learning

Total Page:16

File Type:pdf, Size:1020Kb

SPOTTER: Extending Symbolic Planning Operators Through Targeted Reinforcement Learning SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning Vasanth Sarathy∗ Daniel Kasenberg∗ Shivam Goel Smart Information Flow Technologies Tufts University. Tufts University. Lexintgon, MA Medford, MA Medford, MA [email protected] [email protected] [email protected] Jivko Sinapov Matthias Scheutz Tufts University. Tufts University. Medford, MA Medford, MA [email protected] [email protected] ABSTRACT used to complete a variety of tasks), available human knowledge, Symbolic planning models allow decision-making agents to se- and interpretability. However, the models are often handcrafted, quence actions in arbitrary ways to achieve a variety of goals in difficult to design and implement, and require precise formulations dynamic domains. However, they are typically handcrafted and that can be sensitive to human error. Reinforcement learning (RL) tend to require precise formulations that are not robust to human approaches do not assume the existence of such a domain model, error. Reinforcement learning (RL) approaches do not require such and instead attempt to learn suitable models or control policies models, and instead learn domain dynamics by exploring the en- by trial-and-error interactions in the environment [28]. However, vironment and collecting rewards. However, RL approaches tend RL approaches tend to require a substantial amount of training in to require millions of episodes of experience and often learn poli- moderately complex environments. Moreover, it has been difficult cies that are not easily transferable to other tasks. In this paper, to learn abstractions to the level of those used in symbolic planning we address one aspect of the open problem of integrating these approaches through low-level reinforcement-based exploration. approaches: how can decision-making agents resolve discrepancies Integrating RL and symbolic planning is highly desirable, enabling in their symbolic planning models while attempting to accomplish autonomous agents that are robust, resilient and resourceful. goals? We propose an integrated framework named SPOTTER that Among the challenges in integrating symbolic planning and RL, uses RL to augment and support (“spot”) a planning agent by dis- we focus here on the problem of how partially specified symbolic covering new operators needed by the agent to accomplish goals models can be extended during task performance. Many real-world that are initially unreachable for the agent. SPOTTER outperforms robotic and autonomous systems already have preexisting models pure-RL approaches while also discovering transferable symbolic and programmatic implementations of action hierarchies. These knowledge and does not require supervision, successful plan traces systems are robust to many anticipated situations. We are inter- or any a priori knowledge about the missing planning operator. ested in how these systems can adapt to unanticipated situations, autonomously and with no supervision. KEYWORDS In this paper, we present SPOTTER (Synthesizing Planning Op- erators through Targeted Exploration and Reinforcement). Unlike Planning, Reinforcement Learning other approaches to action model learning, SPOTTER does not have ACM Reference Format: access to successful symbolic traces. Unlike other approaches to Vasanth Sarathy, Daniel Kasenberg, Shivam Goel, Jivko Sinapov, and Matthias learning low-level implementation of symbolic operators, SPOT- Scheutz. 2021. SPOTTER: Extending Symbolic Planning Operators through TER does not know a priori what operators to learn. Unlike other Targeted Reinforcement Learning. In Proc. of the 20th International Confer- approaches which use symbolic knowledge to guide exploration, ence on Autonomous Agents and Multiagent Systems (AAMAS 2021), London, arXiv:2012.13037v1 [cs.AI] 24 Dec 2020 SPOTTER does not assume the existence of partial plans. UK, May 3–7, 2021, IFAAMAS, 10 pages. We focus on the case where the agent is faced with a symbolic 1 INTRODUCTION goal but has neither the necessary symbolic operator to be able to synthesize a plan nor its low-level controller implementation. Symbolic planning approaches focus on synthesizing a sequence of The agent must invent both. SPOTTER leverages the idea that the operators capable of achieving a desired goal [9]. These approaches agent can proceed by planning alone unless there is a discrepancy rely on an accurate high-level symbolic description of the dynamics between its model of the environment and the environment’s dy- of the environment. Such a description affords these approaches namics. In our approach, the agent attempts to plan to the goal. the benefit of generalizability and abstraction (the model canbe When no plan is found, the agent explores its environment using ∗The first two authors contributed equally. an online explorer looking to reach any state from which it can plan to the goal. When such a state is reached, the agent spawns of- Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems fline “subgoal” RL learners to learn policies which can consistently (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, London, achieve those conditions. As the exploration agent acts, the subgoal UK. © 2021 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. learners learn from its trajectory in parallel. The subgoal learners regularly attempt to generate symbolic preconditions from which be the set of propositional state variables 푓 . A fluent state 휎 is a their candidate operators have high value; if such preconditions complete assignment of values to all fluents in 퐹. That is, 휎 = 퐹 , | | | | can be found, the operator is added with those preconditions into and 휎 includes positive literals (푓 ) and negative literals ( 푓 ). 휎 ¬ 0 the planning domain. We evaluate the approach with experiments represents the initial fluent state. We assume full observability of in a gridworld environment in which the agent solves three puzzles the initial fluent state. We can define a partial fluent state 휎˜ to refer involving unlocking doors and moving objects. to a partial assignment of values to the fluents 퐹. The goal condition In this paper, our contributions are as follows: a framework for is represented as a partial fluent state 휎˜푔. We define 퐹 to be the L( ) integrated RL and symbolic planning; algorithms for solving prob- set of all partial fluent states with fluents in 퐹. lems in finite, deterministic domains in which the agent has partial The operators under this formalism are open-world. A partial domain knowledge and must reach a seemingly unreachable goal planning operator can be defined as 표 = 푝푟푒 표 , eff 표 , 푠푡푎푡푖푐 표 . ⟨ ( ) ( ) ( )⟩ state; and experimental results showing substantial improvements 푝푟푒 표 퐹 are the preconditions, and eff 표 퐹 are the ( ) ∈ L( ) ( ) ∈ L( ) over baseline approaches in terms of cumulative reward, rate of effects, 푠푡푎푡푖푐 표 퐹 are those fluents whose values are known ( ) ⊆ learning and transferable knowledge learned. not to change during the execution of the operator. An operator 표 is applicable in a partial fluent state 휎˜ if 푝푟푒 표 휎˜. The result ( ) ⊆ 1.1 Running example: GridWorld of executing an operator 표 from a partial fluent state 휎˜ is given by the successor function 훿 휎,˜ 표 = eff 표 푟푒푠푡푟푖푐푡 휎,˜ 푠푡푎푡푖푐 표 , Throughout this paper, we will use as a running example a Grid- ( ) ( ) ∪ ( ( )) where 푟푒푠푡푟푖푐푡 휎,˜ 퐹 is the partial fluent state consisting only of World puzzle (Figure 1) which an agent must unlock a door which is ( ′) blocked by a ball. The agent’s planning domain abstracts out the no- 휎˜’s values for fluents in 퐹 ′. The set of all partial fluent states defined tion of specific grid cells, and so all navigation is framed in termsof through repeated application of the successor function beginning at going to specific objects. Under this abstraction, no action sequence 휎˜ provides the set of reachable partial fluent states Δ휎˜ . A complete allows the agent to “move the ball out of the way”. Planning actions operator is an operator표˜ where all the fluents are fully accounted for, namely, 푓 퐹, 푓 eff 표˜ eff 표˜ 푠푡푎푡푖푐 표˜ , where eff 표 = are implemented as handcrafted programs that navigate the puzzle ∀ ∈ ∈ + ( ) ∪ − ( ) ∪ ( ) + ( ) 푓 퐹 : 푓 eff 표 and eff 표 = 푓 퐹 : 푓 eff 표 . We and execute low-level actions (up, down, turn left/right, pickup). { ∈ ∈ ( )} − ( ) { ∈ ¬ ∈ ( )} Because the door is blocked, the agent cannot generate a symbolic assume all operators 표 satisfy eff 표 푝푟푒 표 ≠ ; these are the only ( )\1 ( ) ∅ plan to solve the problem; it must synthesize new symbolic actions operators useful for our purposes. A plan 휋푇 is a sequence of operators 표 , . , 표푛 . A plan 휋 is executable in state 휎 if, for all or operators. An agent using SPOTTER performs RL on low-level ⟨ 1 ⟩ 푇 0 푖 1, , 푛 , 푝푟푒 표푖 휎˜푖 1 where 휎˜푖 = 훿 휎˜푖 1, 표푖 . A plan 휋푇 is actions to learn to reach states from which a plan to the goal exists, ∈ { ··· } ( ) ⊆ − ( − ) and thus can learn a symbolic operator corresponding to “moving said to solve the task 푇 if executing 휋푇 from 휎0 induces a trajectory 휎 , 표 , 휎˜ , . , 표푛, 휎˜푛 that reaches the goal state, namely 휎˜푔 휎˜푛. the ball out of the way”. ⟨ 0 1 1 ⟩ ⊆ An open-world forward search (OWFS) is a breadth-first plan 2 BACKGROUND search procedure where each node is a partial fluent state 휎˜. The successor relationships and applicability of 푂 are specified as de- In this section, we provide a background of relevant concepts in fined above and used to generate successor nodes in the search planning and learning. space. A plan is synthesized once 휎˜푔 has been reached. If 휎0 is a complete state and the operators are complete operators, then each 2.1 Open-World Symbolic Planning node in the search tree will be complete fluent states, as well. We formalize the planning task as an open-world variant of propo- We also define the notion of a “relevant” operator and “regressor” sitional STRIPS [8], 푇 = 퐹, 푂, 휎 , 휎˜푔 .
Recommended publications
  • Classical Planning in Deep Latent Space
    Journal of Artificial Intelligence Research 1 (2020) 1-15 Submitted 1/1; published 1/1 (Table of Contents is provided for convenience and will be removed in the camera-ready.) Contents 1 Introduction 1 2 Preliminaries and Important Background Concepts 4 2.1 Notations . .4 2.2 Propositional Classical Planning . .4 2.3 Processes Required for Obtaining a Classical Planning Model . .4 2.4 Symbol Grounding . .5 2.5 Autoencoders and Variational Autoencoders . .7 2.6 Discrete Variational Autoencoders . .9 3 Latplan: Overview of the System Architecture 10 4 SAE as a Gumbel-Softmax/Binary-Concrete VAE 13 4.1 The Symbol Stability Problem: Issues Caused by Unstable Propositional Symbols . 15 4.2 Addressing the Systemic Uncertainty: Removing the Run-Time Stochasticity . 17 4.3 Addressing the Statistical Uncertainty: Selecting the Appropriate Prior . 17 5 AMA1: An SAE-based Translation from Image Transitions to Ground Actions 19 6 AMA2: Action Symbol Grounding with an Action Auto-Encoder (AAE) 20 + 7 AMA3/AMA3 : Descriptive Action Model Acquisition with Cube-Space AutoEncoder 21 7.1 Cube-Like Graphs and its Equivalence to STRIPS . 22 7.2 Edge Chromatic Number of Cube-Like Graphs and the Number of Action Schema in STRIPS . 24 7.3 Vanilla Space AutoEncoder . 27 7.3.1 Implementation . 32 7.4 Cube-Space AE (CSAE) . 33 7.5 Back-to-Logit and its Equivalence to STRIPS . 34 7.6 Precondition and Effect Rule Extraction . 35 + arXiv:2107.00110v1 [cs.AI] 30 Jun 2021 8 AMA4 : Learning Preconditions as Effects Backward in Time 36 8.1 Complete State Regression Semantics .
    [Show full text]
  • Discovering Underlying Plans Based on Shallow Models
    Discovering Underlying Plans Based on Shallow Models HANKZ HANKUI ZHUO, Sun Yat-Sen University YANTIAN ZHA, Arizona State University SUBBARAO KAMBHAMPATI, Arizona State University XIN TIAN, Sun Yat-Sen University Plan recognition aims to discover target plans (i.e., sequences of actions) behind observed actions, with history plan libraries or action models in hand. Previous approaches either discover plans by maximally “matching” observed actions to plan libraries, assuming target plans are from plan libraries, or infer plans by executing action models to best explain the observed actions, assuming that complete action models are available. In real world applications, however, target plans are often not from plan libraries, and complete action models are often not available, since building complete sets of plans and complete action models are often difficult or expensive. In this paper we view plan libraries as corpora and learn vector representations of actions using the corpora; we then discover target plans based on the vector representations. Specifically, we propose two approaches, DUP and RNNPlanner, to discover target plans based on vector representations of actions. DUP explores the EM-style (Expectation Maximization) framework to capture local contexts of actions and discover target plans by optimizing the probability of target plans, while RNNPlanner aims to leverage long-short term contexts of actions based on RNNs (recurrent neural networks) framework to help recognize target plans. In the experiments, we empirically show that our approaches are capable of discovering underlying plans that are not from plan libraries, without requiring action models provided. We demonstrate the effectiveness of our approaches by comparing its performance to traditional plan recognition approaches in three planning domains.
    [Show full text]
  • Learning to Plan from Raw Data in Grid-Based Games
    EPiC Series in Computing Volume 55, 2018, Pages 54{67 GCAI-2018. 4th Global Con- ference on Artificial Intelligence Learning to Plan from Raw Data in Grid-based Games Andrea Dittadi, Thomas Bolander, and Ole Winther Technical University of Denmark, Lyngby, Denmark fadit, tobo, [email protected] Abstract An agent that autonomously learns to act in its environment must acquire a model of the domain dynamics. This can be a challenging task, especially in real-world domains, where observations are high-dimensional and noisy. Although in automated planning the dynamics are typically given, there are action schema learning approaches that learn sym- bolic rules (e.g. STRIPS or PDDL) to be used by traditional planners. However, these algorithms rely on logical descriptions of environment observations. In contrast, recent methods in deep reinforcement learning for games learn from pixel observations. However, they typically do not acquire an environment model, but a policy for one-step action selec- tion. Even when a model is learned, it cannot generalize to unseen instances of the training domain. Here we propose a neural network-based method that learns from visual obser- vations an approximate, compact, implicit representation of the domain dynamics, which can be used for planning with standard search algorithms, and generalizes to novel domain instances. The learned model is composed of submodules, each implicitly representing an action schema in the traditional sense. We evaluate our approach on visual versions of the standard domain Sokoban, and show that, by training on one single instance, it learns a transition model that can be successfully used to solve new levels of the game.
    [Show full text]
  • Learning Complex Action Models with Quantifiers and Logical Implications
    Artificial Intelligence 174 (2010) 1540–1569 Contents lists available at ScienceDirect Artificial Intelligence www.elsevier.com/locate/artint Learning complex action models with quantifiers and logical implications ∗ Hankz Hankui Zhuo a,b, Qiang Yang b, , Derek Hao Hu b,LeiLia a Department of Computer Science, Sun Yat-sen University, Guangzhou, China, 510275 b Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong article info abstract Article history: Automated planning requires action models described using languages such as the Planning Received 2 January 2010 Domain Definition Language (PDDL) as input, but building action models from scratch is a Received in revised form 4 September 2010 very difficult and time-consuming task, even for experts. This is because it is difficult to Accepted 5 September 2010 formally describe all conditions and changes, reflected in the preconditions and effects of Available online 29 September 2010 action models. In the past, there have been algorithms that can automatically learn simple Keywords: action models from plan traces. However, there are many cases in the real world where Action model learning we need more complicated expressions based on universal and existential quantifiers, Machine learning as well as logical implications in action models to precisely describe the underlying Knowledge engineering mechanisms of the actions. Such complex action models cannot be learned using many Automated planning previous algorithms. In this article, we present a novel algorithm called LAMP (Learning Action Models from Plan traces), to learn action models with quantifiers and logical implications from a set of observed plan traces with only partially observed intermediate state information.
    [Show full text]
  • Arxiv:1803.02208V1 [Cs.AI] 4 Mar 2018 Subbarao Kambhampati Arizona State University E-Mail: [email protected] 2 Hankz Hankui Zhuo Et Al
    Noname manuscript No. (will be inserted by the editor) Discovering Underlying Plans Based on Shallow Models Hankz Hankui Zhuo · Yantian Zha · Subbarao Kambhampati the date of receipt and acceptance should be inserted later Abstract Plan recognition aims to discover target plans (i.e., sequences of actions) behind observed actions, with history plan libraries or domain models in hand. Pre- vious approaches either discover plans by maximally “matching” observed actions to plan libraries, assuming target plans are from plan libraries, or infer plans by execut- ing domain models to best explain the observed actions, assuming that complete do- main models are available. In real world applications, however, target plans are often not from plan libraries, and complete domain models are often not available, since building complete sets of plans and complete domain models are often difficult or expensive. In this paper we view plan libraries as corpora and learn vector represen- tations of actions using the corpora; we then discover target plans based on the vector representations. Specifically, we propose two approaches, DUP and RNNPlanner, to discover target plans based on vector representations of actions. DUP explores the EM-style framework to capture local contexts of actions and discover target plans by optimizing the probability of target plans, while RNNPlanner aims to lever- age long-short term contexts of actions based on RNNs (recurrent neural networks) framework to help recognize target plans. In the experiments, we empirically show that our approaches are capable of discovering underlying plans that are not from plan libraries, without requiring domain models provided. We demonstrate the effective- ness of our approaches by comparing its performance to traditional plan recognition approaches in three planning domains.
    [Show full text]
  • Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes
    Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes Manuela Veloso Carnegie Mellon University Computer Science Department Planning - Fall 2001 Remember the examples on the board. 15-889 – Fall 2001 Veloso, Carnegie Mellon Different Aspects of “Machine Learning” Supervised learning • – Classification - concept learning – Learning from labeled data – Function approximation Unsupervised learning • – Data is not labeled – Data needs to be grouped, clustered – We need distance metric Control and action model learning • – Learning to select actions efficiently – Feedback: goal achievement, failure, reward – Search control learning, reinforcement learning 15-889 – Fall 2001 Veloso, Carnegie Mellon Search Control Learning Improve search efficiency, plan quality • Learn heuristics • (if (and (goal (enjoy weather)) (goal (save energy)) (state (weather fair))) (then (select action RIDE-BIKE))) 15-889 – Fall 2001 Veloso, Carnegie Mellon Learning Opportunities in Planning Learning to improve planning efficiency • Learning the domain model • Learning to improve plan quality • Learning a universal plan • Which action model, which planning algorithm, which heuristic control is the most efficient for a given task? 15-889 – Fall 2001 Veloso, Carnegie Mellon Reinforcement Learning A variety of algorithms to address: • – learning the model – converging to the optimal plan. 15-889 – Fall 2001 Veloso, Carnegie Mellon Discounted Rewards “Reward” today versus future (promised) reward • $100K + $100K + $100K + . • INFINITE!
    [Show full text]
  • Outline of Machine Learning
    Outline of machine learning The following outline is provided as an overview of and topical guide to machine learning: Machine learning – subfield of computer science[1] (more particularly soft computing) that evolved from the study of pattern recognition and computational learning theory in artificial intelligence.[1] In 1959, Arthur Samuel defined machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed".[2] Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building a model from an example training set of input observations in order to make data-driven predictions or decisions expressed as outputs, rather than following strictly static program instructions. Contents What type of thing is machine learning? Branches of machine learning Subfields of machine learning Cross-disciplinary fields involving machine learning Applications of machine learning Machine learning hardware Machine learning tools Machine learning frameworks Machine learning libraries Machine learning algorithms Machine learning methods Dimensionality reduction Ensemble learning Meta learning Reinforcement learning Supervised learning Unsupervised learning Semi-supervised learning Deep learning Other machine learning methods and problems Machine learning research History of machine learning Machine learning projects Machine learning organizations Machine learning conferences and workshops Machine learning publications
    [Show full text]
  • From Unlabeled Images to PDDL (And Back)
    Classical Planning in Latent Space: From Unlabeled Images to PDDL (and back) Masataro Asai, Alex Fukunaga Graduate School of Arts and Sciences University of Tokyo Abstract Current domain-independent, classical planners re- ? quire symbolic models of the problem domain and Initial state Goal state image Original instance as input, resulting in a knowledge acquisi- image (black/white) Mandrill image tion bottleneck. Meanwhile, although recent work in deep learning has achieved impressive results in Figure 1: An image-based 8-puzzle. many fields, the knowledge is encoded in a subsym- bolic representation which cannot be directly used [ ] by symbolic systems such as planners. We propose Mnih et al., 2015; Graves et al., 2016 . However, the current LatPlan, an integrated architecture combining deep state-of-the-art in pure NN-based systems do not yet provide learning and a classical planner. Given a set of un- guarantees provided by symbolic planning systems, such as labeled training image pairs showing allowed ac- deterministic completeness and solution optimality. tions in the problem domain, and a pair of images Using a NN-based perceptual system to automatically pro- representing the start and goal states, LatPlan uses vide input models for domain-independent planners could a Variational Autoencoder to generate a discrete greatly expand the applicability of planning technology and latent vector from the images, based on which a offer the benefits of both paradigms. We consider the problem PDDL model can be constructed and then solved of robustly, automatically bridging the gap between such sub- by an off-the-shelf planner. We evaluate LatPlan us- symbolic representations and the symbolic representations ing image-based versions of 3 planning domains: required by domain-independent planners.
    [Show full text]
  • Integrating Planning, Execution and Learning to Improve Plan Execution
    Computational Intelligence, Volume 00, Number 000, 2009 Integrating Planning, Execution and Learning to Improve Plan Execution Sergio Jimenez´ , Fernando Fernandez´ and Daniel Borrajo Departamento de Inform´aatica, Universidad Carlos III de Madrid. Avda. de la Universidad, 30. Legan´es(Madrid). Spain Algorithms for planning under uncertainty require accurate action models that explicitly cap- ture the uncertainty of the environment. Unfortunately, obtaining these models is usually complex. In environments with uncertainty, actions may produce countless outcomes and hence, specifying them and their probability is a hard task. As a consequence, when implementing agents with planning capabilities, practitioners frequently opt for architectures that interleave classical planning and execution monitoring following a replanning when failure paradigm. Though this approach is more practical, it may produce fragile plans that need continuous replanning episodes or even worse, that result in execution dead-ends. In this paper, we propose a new architecture to relieve these shortcomings. The architecture is based on the integration of a relational learning component and the traditional planning and execution monitoring components. The new component allows the architecture to learn probabilistic rules of the success of actions from the execution of plans and to automatically upgrade the planning model with these rules. The upgraded models can be used by any classical planner that handles metric functions or, alternatively, by any probabilistic planner. This architecture proposal is designed to integrate off-the-shelf interchangeable planning and learning components so it can profit from the last advances in both fields without modifying the architecture. Key words: Cognitive architectures, Relational reinforcement learning, Symbolic planning. 1.
    [Show full text]
  • Action Model Learning: Challenges and Techniques
    Comenius University, Bratislava Faculty of Mathematics, Physics and Informatics Department of Applied Informatics Action Model Learning: Challenges and echniques Dissertation Thesis Michal Certick" y$ Bratislava, 2013 Comenius University Faculty of Mathematics, Physics and Informatics Department of Applied Informatics Action Model Learning: Challenges and Techniques Dissertation Thesis Author: RNDr. Michal Certic!"y #upervisor: doc. PhDr. $"an #efr"anek CSc. #tudy Programme: &.2.(. Computer #cience Bratislava 2013 15391687 �������� ���������� �� ���������� ������� �� ������������ ������� ��� ����������� ������ ���������� ���� ��� �������� ����� ������ �������� ����� ���������� �������� ������� ������� ������ ������ ����� ���� ����� ���� ���� ����� ����� �� ������ ������ �������� �������� ����������� ���� �� ������� ������������ ������ �������� �� ������� ������� ��������� ��������� ������ ������ ������ ����� ��������� ���������� ��� ���������� ���� � �� ������� ��� �������� ������� ��� ��������� ������ ����� ��������� ����� �� ��� ��� �� ���������� �������� �� ����� ������������� �� ���������� ����������� � ������ ��� ���������� ��� ������� ����� � ������� ����������� �� ������ ��� ���������� �� ���������� ����������� ������ ���� ���� ��� ��������� ���� ����������� �������� � ���������� �� ������� ����������� ������ �������� ���� ����� ��� ������ ���� ��������� ���������� ��������� ���������� ����� ����� ��������� ������ ���� ��������� �� ����� ��������� ������� ����� 15391687 ���������� ���������� � ���������� ������� ����������� ������
    [Show full text]
  • Discrete Word Embedding for Logical Natural Language Understanding
    Under review as a conference paper at ICLR 2021 DISCRETE WORD EMBEDDING FOR LOGICAL NATU- RAL LANGUAGE UNDERSTANDING Anonymous authors Paper under double-blind review ABSTRACT We propose an unsupervised neural model for learning a discrete embedding of words. Unlike existing discrete embeddings, our binary embedding supports vec- tor arithmetic operations similar to continuous embeddings. Our embedding rep- resents each word as a set of propositional statements describing a transition rule in classical/STRIPS planning formalism. This makes the embedding directly com- patible with symbolic, state of the art classical planning solvers. 1 INTRODUCTION When we researchers write a manuscript for a conference submission, we do not merely follow the probability distribution crystalized in our brain cells. Instead, we modify, erase, rewrite sentences over and over, while only occasionally let the fingers produce a long stream of thought. This writing process tends to be a zero-shot attempt to materialize and optimize a novel work that has never been written. Except for casual writing (e.g. online messages), intelligent writing inevitably contains an aspect of backtracking and heuristic search behavior, and thus is often like planning for information delivery while optimizing various metrics, such as the impact, ease of reading, or conciseness. After the initial success of the distributed word representation in Word2Vec (Mikolov et al., 2013b), natural language processing techniques have achieved tremendous progress in the last decade, pro- pelled primarily by the advancement in data-driven machine learning approaches based on neural networks. However, these purely data-driven approaches that blindly follow the highest probability interpolated from data at each time step could suffer from biased decision making (Caliskan et al., 2017; Bolukbasi et al., 2016) and is heavily criticized recently.
    [Show full text]
  • Extending Symbolic Planning Operators Through Targeted Reinforcement Learning
    Main Track AAMAS 2021, May 3-7, 2021, Online SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning Vasanth Sarathy∗ Daniel Kasenberg∗ Shivam Goel Smart Information Flow Technologies Tufts University Tufts University Lexintgon, MA Medford, MA Medford, MA [email protected] [email protected] [email protected] Jivko Sinapov Matthias Scheutz Tufts University Tufts University Medford, MA Medford, MA [email protected] [email protected] ABSTRACT used to complete a variety of tasks), available human knowledge, Symbolic planning models allow decision-making agents to se- and interpretability. However, the models are often handcrafted, quence actions in arbitrary ways to achieve a variety of goals in difficult to design and implement, and require precise formulations dynamic domains. However, they are typically handcrafted and that can be sensitive to human error. Reinforcement learning (RL) tend to require precise formulations that are not robust to human approaches do not assume the existence of such a domain model, error. Reinforcement learning (RL) approaches do not require such and instead attempt to learn suitable models or control policies models, and instead learn domain dynamics by exploring the en- by trial-and-error interactions in the environment [28]. However, vironment and collecting rewards. However, RL approaches tend RL approaches tend to require a substantial amount of training in to require millions of episodes of experience and often learn poli- moderately complex environments. Moreover, it has been difficult cies that are not easily transferable to other tasks. In this paper, to learn abstractions to the level of those used in symbolic planning we address one aspect of the open problem of integrating these approaches through low-level reinforcement-based exploration.
    [Show full text]