Action Model Learning: Challenges and Techniques

Comenius University, Bratislava Faculty of Mathematics, Physics and Informatics Department of Applied Informatics Action Model Learning: Challenges and echniques Dissertation Thesis Michal Certick" y$ Bratislava, 2013 Comenius University Faculty of Mathematics, Physics and Informatics Department of Applied Informatics Action Model Learning: Challenges and Techniques Dissertation Thesis Author: RNDr. Michal Certic!"y #upervisor: doc. PhDr. $"an #efr"anek CSc. #tudy Programme: &.2.(. Computer #cience Bratislava 2013 15391687 �� 15391687 �� Abstract Knowledge about domain dynamics, describing how certain actions affect the world, is called an action model and constitutes the essential requirement for planning and goal-oriented intelligent behaviour. Action learning, as the automatic construction of action models based on the observations, has become a hot research topic in recent years, and various methods employing a wide variety of AI tools have been developed. Diversity of these methods renders each of them usable under different conditions and in various domains. We have identi ed a common collection of important properties and used them as a basis for the comprehensive comparison and as a set of challenges that we were trying to overcome when designing our two new methods. First of them, an online algorithm called "SG, can be used to learn the probabilistic action models with conditional effects in the presence of incomplete observations, sensoric noise, and action failures, while #eeping the computa- tional complexity low $polynomial w.r.t. the size of the input&, thus managing to cope with all the appointed challenges. 'ur experiments, conducted in two fundamentally different domains (action computer game Unreal Tour- nament 2004 and a real-world robotic platform SyRoTek&, demonstrate its usability in computationally intensive scenarios, as well as in the presence of action failures and signi cant sensoric noise. 'ur second method, based on a certain paradigm of logic programming called Reactive ASP, is not as versatile, being able to deal with only half of the challenges, but may still prove practical in some situations because of its purely declarative nature. Keywords: action model, learning, partially observable, probabilistic, noise, action failures, conditional effects, answer set programming Abstrakt Znalosti o dynami#e dom*eny, popisu+*uce a#o jednotliv*eakcie ovplyv,nu+*u svet, sa na%*yva+*u modelom akci!" a predstavu+*unutn*upodmien#u pre pl*anovanie a ciel-avedom*einteligentn*espr*avanie. U#ceniesa akci!", teda automatic- #*a#on,stru#cia a spres,novanie modelov akci*. na %*a#lade pozorovan*., sa v posledn*ych rokoch stalo atra#t*.vnou v*yskumnou t*emou a ob+avilo sa niekol-#o zau+*.mav*ych metód, vyu,%*.va+*ucich ,siro#*upaletu met*odUI. Ich diverzita m*a za n*asledok fa#t, ,%e #a,%d*a% nich je pou,%itel-n*a%a r0oznych podmieno#, resp. v rozdielnych typoch dom*en. Identi #ovali sme spolo,cn*umno,zinu d0ole,%it*ych vlastnost*.t*ychto met*od a pou,zili ich ako %*aklad pre ich d0o#ladn*uanal*y%u a a#o ciele pri tvorbe na,sich dvoch nov*ych metód. Prv*a% nich, online algoritmus s n*azvom "SG, produ#uje pravdepodobnostn!e modely akci*.s podmienen!ymiefektami, a vyrovn*asa so $lyhan!ımakci!", sen- zorick!ym#sumom, a ne!upln!ymipozorovaniami, pri,com v,sak udr,ziava svoju ,casov*uzlo,%itost- dostato,cne n*.%#u $polynomi*alnu vzhl-adom na vel-#ost- vstu- pu&. Na,seexperimenty vy#onan*ev dvoch fundament*alne odli,sn*ych dom*e- nach $ak,cnej po,c*.ta,cove+ hre Unreal Tournament 2004 a re*alnej robotic#ej platforme SyRoTek), demon,stru+*upou,%itel-nost- tohoto algoritmu vo v*ypo,ctovo n*aro,cn*ych prostrediach, a#o a+ za pr*.tomnosti nezanedbatel-n*eho senzoric#*eho ,sumu a zlyhan*.a#ci*.. Na,sa druh*amet*oda, %alo,zen*ana ,speci c#ej paradigme logic#*eho programova- nia naz*yvanej Reakt!ıvneASP, nie je natol-#o univerz*alna, #ed-,zesp*l,na iba polovicu nami stanoven*ych ciel-ov. 2apriek tomu v,sa# m0o,zebyt- vhodn*ym #andid*atom v ist*ych situ*aci*ach, vd-a#a svojej deklarat*.vnej povahe. Kl’úˇcovéslová: model akci*., u,cenie, pl*anovanie, ,ciasto,cn*apozorovatel-nost-, pravdepodobnost-, ,sum, zlyhania a#ci*., podmienen*eefekty, AS1 Contents 1 Introduction 9 4.4 5ackground . ............. ............ 6 4.7 1roblems . ............. ............ 48 4." 1roposed 3olutions............ ............ 44 4.9 'utline. ............. ............ 47 2 Preliminaries 14 7.4 1lanning: Why do we need a domain description; . ............. ......... 49 7.7 <epresentation =anguages: >ow to express our action model; ........... ......... 4? 7." Action =earning: Why do we need to obtain action models automatically; . ......... 4@ 7.9 Ahallenges and 1roperties . ............ 4B Evaluation based on Properties 27 ".4 3=AF - 3imultaneous =earning and Filtering . ............. ......... 76 ".7 ARC3 - Action-Relation Codelling 3ystem . ............. ......... "" "." =earning through =ogic 1rogramming and A-Prolog . ............. ......... "6 ".9 Kernelised Doted 1erceptrons with Deictic Coding . ............. ......... 9B ".? Ereedy 3earch for =earning Noisy Deictic <ule 3ets ............. ......... ?9 ".@ New Methods . ............. ............@7 4 Evaluation based on Domain Compatibility 64 & Basic Notions %) % Representation Structures #, @.4 Fransition Relation ............ ......... B8 TR @.7 Gffect <elation ............. ........... B4 ER @." Gffect !ormula ............ ............B7 EF @.9 A-Prolog =iterals ............. .......... B9 AL @.? 1robability !unction ........... ............B@ # !-amples. Action /earning and Performance 2easures #) B.4 1ositive and 2egative G(amples . ............BH B.7 Action =earning . ............. ............H7 B." Ciscellaneous Notions .......... ............H" )"SG Algorithm )& H.4 Fhe Algorithm . ............. ............H? H.7 Aorrectness and Aomplexity . ............H6 H." 1erformance Cetrics . ............ 6" H.9 Additional Ideas . ............. ............ 6@ 9 !-"eriments with"SG Algorithm 97 6.4 Unreal Fournament7889......... ............ 6B 6.7 <obotic 1latform 3yRoTe# . ............489 6." 3ummary . ............. ............48B 1, Reactive ASP Method 109 40.4 <eactive AS1 . ............. ............448 40.7 =earning with <eactive AS1 . ............444 40." Noise and Non-determinism . ............44@ 40.9 3imilarities and Differences . ............44B 40.? 1erformance Cetrics . ............474 11 !-"eriments wit3 Reactive ASP 2ethod 123 41.4 !ully 'bservable 5loc#s World . 479 41.7 1artially 'bservable 5locks World . ............47@ 41." 3ummary . ............. ............47H 12 Conclusion and 4uture 5ork 129 A""endices 1 A Additional Properties of !6ects (learned by"SG8 1 ' 5orst Case 1or"SG Algorithm 1 # C Dynamic Error 9olerance 1or *eactive ASP Method 143 Chapter 1 Introduction 1.1 Back0round Knowledge about domain dynamics, describing how certain actions affect the world, is essential for planning and intelligent goal-oriented behaviour of both living and arti cial agents. 3uch #nowledge is in arti cial systems referred to as action model, and is usually manually constructed by domain experts. Complete speci cation of action models in case of complex domains is, how- ever, a diIcult and time consuming task. In addition to that, it is often needed to modify this action model when confronted with new information. <esearch on various methods of automatic construction of action models has therefore become a hot topic in recent years. Fhis inductive process of con- structing and subsequent modi cation of action models is referred to as action

Action Model Learning: Challenges and Techniques

Classical Planning in Deep Latent Space

Discovering Underlying Plans Based on Shallow Models

Learning to Plan from Raw Data in Grid-Based Games

Learning Complex Action Models with Quantifiers and Logical Implications

Arxiv:1803.02208V1 [Cs.AI] 4 Mar 2018 Subbarao Kambhampati Arizona State University E-Mail: [email protected] 2 Hankz Hankui Zhuo Et Al

Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes

Outline of Machine Learning

From Unlabeled Images to PDDL (And Back)

Integrating Planning, Execution and Learning to Improve Plan Execution

Discrete Word Embedding for Logical Natural Language Understanding

Extending Symbolic Planning Operators Through Targeted Reinforcement Learning

Pre-Proceedings of the 12Th International Workshop on Neural-Symbolic Learning and Reasoning Nesy’17