Arxiv:2108.04812V1 [Cs.CL] 10 Aug 2021 from Supervised Data, Including Via Active Learn- Ing Natural Language

Arxiv:2108.04812V1 [Cs.CL] 10 Aug 2021 from Supervised Data, Including Via Active Learn- Ing Natural Language

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior Noriyuki Kojima, Alane Suhr, and Yoav Artzi Department of Computer Science and Cornell Tech, Cornell University [email protected] {suhr, yoav}@cs.cornell.edu <latexit sha1_base64="XR5Yz8Ha2D6x8ktwsYyJHNrBqjs=">AAACDXicbVA9SwNBEN3zM8avqKXNYhSswp2FWoo2FhYRvCgkR9jbzCWLe7vH7lwwHPkDNv4VGwtFbO3t/DduPgq/Hgw83pthZl6cSWHR9z+9mdm5+YXF0lJ5eWV1bb2ysdmwOjccQq6lNjcxsyCFghAFSrjJDLA0lnAd356N/Os+GCu0usJBBlHKukokgjN0Uruy20K4wzgpLoAZJVSXJkanNLRg6Cn0WF9oM2xXqn7NH4P+JcGUVMkU9Xblo9XRPE9BIZfM2mbgZxgVzKDgEoblVm4hY/yWdaHpqGIp2KgYfzOke07p0EQbVwrpWP0+UbDU2kEau86UYc/+9kbif14zx+Q4KoTKcgTFJ4uSXFLUdBQN7QgDHOXAEcaNcLdS3mOGcXQBll0Iwe+X/5LGQS04rAWXB9WT02kcJbJNdsg+CcgROSHnpE5Cwsk9eSTP5MV78J68V+9t0jrjTWe2yA9471/ZeJwL</latexit> Initialization<latexit sha1_base64="+XeULmdxQVYSlfEOkBPApaVE+xA=">AAACAHicbZA7T8MwFIUdnqW8AgwMLBYVElOVdADGChbYikQfUhtVjuu0Vh0nsm8QJcrCX2FhACFWfgYb/wY3zQAtR7L06Zx7Zd3jx4JrcJxva2l5ZXVtvbRR3tza3tm19/ZbOkoUZU0aiUh1fKKZ4JI1gYNgnVgxEvqCtf3x1TRv3zOleSTvYBIzLyRDyQNOCRirbx/2gD2AH6Q3kgMngj/mQda3K07VyYUXwS2gggo1+vZXbxDRJGQSqCBad10nBi8lCjgVLCv3Es1iQsdkyLoGJQmZ9tL8gAyfGGeAg0iZJwHn7u+NlIRaT0LfTIYERno+m5r/Zd0Eggsv5TJOgEk6+yhIBIYIT9vAA64YBTExQKgyDVBMR0QRCqazsinBnT95EVq1qntWdW9rlfplUUcJHaFjdIpcdI7q6Bo1UBNRlKFn9IrerCfrxXq3PmajS1axc4D+yPr8AeXDlz4=</latexit> Learning from User Behavior Abstract <latexit sha1_base64="3JiF7F+rKxpyzzztwWXoSeOlDOc=">AAAB/3icbVDLSgMxFM3UV62vUcGNm2AruChlpoK6EYpuXFaxD2iHkslk2tBMMiQZoYxd+CtuXCji1t9w59+YtrPQ1gOBwzn3cG+OHzOqtON8W7ml5ZXVtfx6YWNza3vH3t1rKpFITBpYMCHbPlKEUU4ammpG2rEkKPIZafnD64nfeiBSUcHv9SgmXoT6nIYUI22knn1wJxIeKFiSl265Wj4tdwOhValnF52KMwVcJG5GiiBDvWd/mSBOIsI1ZkipjuvE2kuR1BQzMi50E0VihIeoTzqGchQR5aXT+8fw2CgBDIU0j2s4VX8nUhQpNYp8MxkhPVDz3kT8z+skOrzwUsrjRBOOZ4vChEEt4KQMGFBJsGYjQxCW1NwK8QBJhLWprGBKcOe/vEia1Yp7VnFvq8XaVVZHHhyCI3ACXHAOauAG1EEDYPAInsEreLOerBfr3fqYjeasLLMP/sD6/AE6+pRN</latexit> Rounds r =1, 2, 3,... User<latexit sha1_base64="/rdkM82DieGD7BktcDWwmh29QOE=">AAACAXicbVA9SwNBEN3zM8avqI1gsxgEq3AnopZBG+0imA9IQtjbTJIle3vH7pwYjtj4V2wsFLH1X9j5b9xLrtDEBwOP92aYmedHUhh03W9nYXFpeWU1t5Zf39jc2i7s7NZMGGsOVR7KUDd8ZkAKBVUUKKERaWCBL6HuD69Sv34P2ohQ3eEognbA+kr0BGdopU5hv4XwgEnVgKY3CkEznhpm3CkU3ZI7AZ0nXkaKJEOlU/hqdUMeB6CQS2ZM03MjbCdMo+ASxvlWbCBifMj60LRUsQBMO5l8MKZHVunSXqhtKaQT9fdEwgJjRoFvOwOGAzPrpeJ/XjPG3kU7ESqKERSfLurFkmJI0zhoV2jgKEeWMK6FvZXyAUtTsKHlbQje7MvzpHZS8s5K3u1psXyZxZEjB+SQHBOPnJMyuSYVUiWcPJJn8krenCfnxXl3PqatC042s0f+wPn8AUVnl2w=</latexit> Interactions We study continual learning for natural lan- <latexit sha1_base64="wLaDS53RIYIfKWhct78XfXM1NS8=">AAACGXicbVC7TsMwFHXKq4RXgJHFokViqpIOwFhBB8Yi6ENqqspxb1urjhPZTqUq6m+w8CssDCDECBN/g9N2gJYjWTo6517fe08Qc6a0635bubX1jc2t/La9s7u3f+AcHjVUlEgKdRrxSLYCooAzAXXNNIdWLIGEAYdmMLrJ/OYYpGKReNCTGDohGQjWZ5RoI3Ud16cgNEgmBvZ9EoMcMwU97Pt2lWjzr8ZFPyR6SAlPq9OuW+w6BbfkzoBXibcgBbRAret8+r2IJqEZQzlRqu25se6kRGpGOUxtP1EQEzoiA2gbKkgIqpPOLpvisyTbph9J84TGM/V3R0pCpSZhYCqzLdWyl4n/ee1E9686KRNxokHQ+aB+wrGOcBYT7jEJVPOJIYRKZnbFdEgkoSYrZZsQvOWTV0mjXPIuSuW7cqFyvYgjj07QKTpHHrpEFXSLaqiOKHpEz+gVvVlP1ov1bn3MS3PWoucY/YH19QN4IZ/w</latexit> <latexit sha1_base64="uAHfSldyAZLG1P0LGIqJtnGVqVY=">AAACHXicbVDLSgMxFM3UVx1fVZdugq3gqswUUZfFduGygn1Ap5RMmrahmcyQ3BHK0B9x46+4caGICzfi35hpZ6GtBwKHc+7Nvff4keAaHOfbyq2tb2xu5bftnd29/YPC4VFLh7GirElDEaqOTzQTXLImcBCsEylGAl+wtj+ppX77gSnNQ3kP04j1AjKSfMgpASP1CxceZRKY4nJk1wmYnwCXvIDAmBKR1Gd9VcKeh+1aKDWomKZddr9QdMrOHHiVuBkpogyNfuHTG4Q0DswoKojWXdeJoJcQBZwKNrO9WLOI0AkZsa6hkgRM95L5dTN8ZpQBHobKPAl4rv7uSEig9TTwTWW6t172UvE/rxvD8LqXcBnFwCRdDBrGAkOI06jwgCtGQUwNIVRxsyumY6IINXnpNAR3+eRV0qqU3cuye1cpVm+yOPLoBJ2ic+SiK1RFt6iBmoiiR/SMXtGb9WS9WO/Wx6I0Z2U9x+gPrK8fRcehZA==</latexit> guage instruction generation, by observing SupervisedDatasets Dataset r D Dataset 0 Construction human users’ instruction execution. We fo- D cus on a collaborative scenario, where the <latexit sha1_base64="0/DXBOmpnV5/rAwEIMfTGn4nwi4=">AAACYHicbZBPbxMxEMW9W/6E0NIUbnCxSJGKhKLdHgCpl6ocygFQkEhbKY6iWe8kseq1V/ZsabTKl+yNA5d+EpzNHqBlJMs/vTcjj19WauUpSX5F8daDh48ed550n27vPNvt7T0/87ZyEkfSausuMvColcERKdJ4UTqEItN4nl1+WvvnV+i8suYHLUucFDA3aqYkUJCmvZ9CoiF0ysy7gvCa6m9AlQPNv4CZVzDHFReitU7RoGsG+Vebo26sfZGBq68De1Xw4YGQuSUuCpXzBt9tLn7EBS2QYFq71dv9aa+fDJKm+H1IW+iztobT3o3IrayKsK3U4P04TUqa1OBISY2rrqg8liAvw8bjgAYK9JO6CWjF3wQl5zPrwjHEG/XviRoK75dFFjoLoIW/663F/3njimYfJ7UyZUVo5OahWaU5Wb5Om+fKoSS9DADSqbArlwtwIEPkvhtCSO9++T6cHQ7S94P0+2H/+KSNo8NesdfsgKXsAztmn9mQjZhkv6OtaDvaiW7jTrwb721a46idecH+qfjlH3qutWg=</latexit> <latexit sha1_base64="RruFSUnH/9uEUc0zynrlQ/gIawM=">AAACCnicbVC7TsMwFHXKq4RXgJElUCExVUkHYKxgYSyiL6mJKse9aa06TmQ7laqoMwu/wsIAQqx8ARt/g9NmgJYjWTo65z58T5AwKpXjfBultfWNza3ytrmzu7d/YB0etWWcCgItErNYdAMsgVEOLUUVg24iAEcBg04wvs39zgSEpDFvqmkCfoSHnIaUYKWlvnXqEeAKBOVD8yFNQEyohIHteWZTYMq13LcqTtWZw14lbkEqqECjb315g5ikkZ5LGJay5zqJ8jMsFCUMZqaXSkgwGeMh9DTlOALpZ/NTZvZ5mq8PY6EfV/Zc/d2R4UjKaRToygirkVz2cvE/r5eq8NrPKE9SBZwsFoUps1Vs57nYAyqAKDbVBBNB9V9tMsICEx2ONHUI7vLJq6Rdq7qXVfe+VqnfFHGU0Qk6QxfIRVeoju5QA7UQQY/oGb2iN+PJeDHejY9Fackoeo7RHxifP+wSmmY=</latexit> Natural Language system both acts and delegates tasks to hu- Supervised <latexit sha1_base64="+NGIPhLziU6XwLTdmWPpfMBIYU0=">AAACJXicbVC7TgJBFJ3FF64v1NJmIphYELJLoRYWRCksMZFHwhIyOwwwYXZ2M3PXhGz4GRt/xcZCYkys/BVngQLBk9zk5Nz38SPBNTjOt5XZ2Nza3snu2nv7B4dHueOThg5jRVmdhiJULZ9oJrhkdeAgWCtSjAS+YE1/dJ/mm89MaR7KJxhHrBOQgeR9TgkYqZu79SiTwBSXA7tKwEwCjT0P2wUvIDCkRCTVSdcper0QdHFZUwW7m8s7JWcGvE7cBcmjBWrd3NTMoXFgVlJBtG67TgSdhCjgVLCJ7cWaRYSOyIC1DZUkYLqTzL6c4Auj9HA/VCYk4Jm63JGQQOtx4JvK9Ey9mkvF/3LtGPo3nYTLKAYm6XxRPxYYQpxahntcMQpibAihiptbMR0SRajxTacmuKsvr5NGueReldzHcr5yt7Aji87QObpELrpGFfSAaqiOKHpBb+gDTa1X6936tL7mpRlr0XOK/sD6+QUpOqRr</latexit> Generation Model Datasets man users using natural language. We com- Training x¯ P ( , ; ✓ ) 0,..., r pare user execution of generated instruc- ⇠ ·|· · r D D tions to the original system intent as an indi- <latexit sha1_base64="f5fpd9LH40a9VXlIHe4kj1pwqZw=">AAACA3icbVDLSsNAFJ3UV42vqDvdBIvgxpJkoS6LLnRZoS9oQplMb9qhk0mYmQglFNz4K25cKOLWn3Dn3zhts9DWAxcO59w7c+8JU0alcpxvo7Syura+Ud40t7Z3dves/YOWTDJBoEkSlohOiCUwyqGpqGLQSQXgOGTQDkc3U7/9AELShDfUOIUgxgNOI0qw0lLPOvIJcAWC8oF5W2+ce75vtoEOhkr2rIpTdWawl4lbkAoqUO9ZX34/IVmsHyQMS9l1nVQFORaKEgYT088kpJiM8AC6mnIcgwzy2Q0T+1QrfTtKhC6u7Jn6eyLHsZTjONSdMVZDuehNxf+8bqaiqyCnPM0UcDL/KMqYrRJ7GojdpwKIYmNNMBFU72qTIRaY6FSkqUNwF09eJi2v6l5UvXuvUrsu4iijY3SCzpCLLlEN3aE6aiKCHtEzekVvxpPxYrwbH/PWklHMHKI/MD5/AHGNlsA=</latexit> GPT-2 <latexit sha1_base64="HKC48qWozmgZTYxXmBLYcHdnuaY=">AAACDnicbZC7TsMwFIYdriXcAowsFlUlpirpAIxVuzAWqTepraoTx2mtOk5kO4gq6hOw8CosDCDEyszG2+C2GaDlSJY+/f85ts/vJ5wp7brf1sbm1vbObmHP3j84PDp2Tk7bKk4loS0S81h2fVCUM0FbmmlOu4mkEPmcdvxJfe537qlULBZNPU3oIIKRYCEjoI00dEp9QoWmkomRXY8NPegUOK6BCJjGTQlMGGvoFN2yuyi8Dl4ORZRXY+h89YOYpJG5m3BQque5iR5kIDUjnM7sfqpoAmQCI9ozKCCiapAt1pnhklECHMbSHKHxQv09kUGk1DTyTWcEeqxWvbn4n9dLdXgzyJhIUk0FWT4UphzrGM+zwQGTlGg+NQBEMvNXTMYggZiAlG1C8FZXXod2pexdlb27SrFay+MooHN0gS6Rh65RFd2iBmohgh7RM3pFb9aT9WK9Wx/L1g0rnzlDf8r6/AFAoZw7</latexit> cation to the system’s success communicat- Contextual Bandit Training Weights ing its intent. We show how to use this sig- nal to improve the system’s ability to gener- Figure 1: Diagram of our learning process. We ini- ate instructions via contextual bandit learn- tialize a generation model using supervised learn- ing. In interaction with real users, our sys- ing, and continually learn through interaction with tem demonstrates dramatic improvements in users, by alternating between observing user exe- its ability to generate language over time. cution of generated instructions and training. 1 Introduction generation by observing human users executing generated instructions. We learn by comparing Natural language provides an expressive and ac- instruction execution to the system intent, and cessible avenue to instruct non-expert users. The demonstrate how this results in a system that con- ability to generate instructions is critical for sys- tinually improves its natural language generation tems that collaborate with users, for example to ability through interaction with users. Figure1 il- delegate tasks. In such scenarios, the system gen- lustrates our learning process. erates language to communicate to the user a latent We design a task-oriented collaborative sce- intent. When users are cooperative and proficient nario using the CEREALBAR game environ- in the language, whether they accomplish the sys- ment (Suhr et al., 2019). In CEREALBAR, two tem’s intent provides an informative, albeit noisy agents, a leader and a follower, work together to signal to the quality of instruction generation. complete tasks. The leader plans the tasks to com- This implicit signal is fundamentally different plete, and communicates goals to the follower us- arXiv:2108.04812v1 [cs.CL] 10 Aug 2021 from supervised data, including via active learn- ing natural language. CEREALBAR was originally ing, in that it does not label the system’s intent introduced for studying follower instruction exe- with a written instruction, but only provides evi- cution. We modify it to focus on generation of dence to the quality of a given instruction in re- leader instructions, which are then executed by hu- laying this intent. As a natural byproduct of inter- man followers. The collaborative, embodied setup action with users, it also differs from explicit user effectively engages users, and aligns their incen- feedback in not requiring user action beyond what tives with executing the system’s instructions to they already do as part of the interaction. Despite the best of their abilities. its potential and prevalence, this signal is under- A major challenge is inferring a learning sig- studied for

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    16 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us