Zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften (Dr.-Ing.) von der KIT-Fakultät für Wirtschaftswissenschaften des Karlsruher Instituts für Technologie (KIT) genehmigte Dissertation von Patrick Raoul Philipp, M.Sc.

Decision-Making with Multi-Step Expert Advice on the Web

Patrick Raoul Philipp

Tag der mündlichen Prüfung: 13.04.2018 Referent: PD Dr. Achim Rettinger Korreferent: Prof. Dr. Kristian Kersting

Karlsruhe, 2018

mg rcsigadSria hs eonto,adNPfrtxuldt nteWeb, the on (NERD). Disambiguation data and textual Recognition Entity for Named of NLP task and the with Recognition, deal Phase we where Surgical and Processing Image architectures. Web-based the in with experts of modeled execution (SEPs) vocabularies data-driven Processes Expert the structured Semantic enable to with to EPs experts extend and of (RDF) Framework Semantic representations Description the Resource the for developed lift methods We exploiting by Web. pipelines discovering, Learning automatically expert Relational for composing framework Statistical and a from develop executing We techniques we (2) integrate which coordination. and Multiagent (MDP), or (EP) Process (SRL) Process Decision Markov Expert a as of to specialization refer as decision-making. decentralized problem and the centralized formalize between We instances task distinguish of we number where a solutions), (comprising and sets data reference available assuming pipelines expert configurations. au- service task correct Web user-specified finding interpreting parameters, efficiently data, pipelines. or input and expert goals services and expert compose available functionalities integrate and complex to execute entails of discover, tomation descriptions to need textual multi- solve we mostly automatically To services, With also (2) expert while expert. with solutions, an executing correct tasks pipelines to of step adhered expert number costs individual the the construct maximize account to to into order taking how for in learn solutions instances must correct task thus generates individual One pipeline for single instances. no task complex, possible sufficiently all is task the of that consist hypotheses.Given often generate to which experts tasks, of diverse pipelines solve pay require to customers thus used having and be and steps can platform multiple experts their ex- Such via monetize executions. available Algorithmia single services same as for expert the such making platforms for by provider experts Service access exchangeable Web. pert of the amount on increasing available an are steps progresses research as and sion, experts where Web, task the individual on for tasks services. solving experts Web maximizing of as combining process for available and the are methods selecting automating with for by contribute methods solutions We and algo- task instances are tasks. correct which such of experts, of number from steps the advice individual using by solve tasks to multi-step rithms solving with deals thesis This Abstract eeaut u ehd ndfeetdmis aeyMdclAssac ihtssin tasks with Assistance Medical namely domains, different in methods our evaluate We well-performing learn to enable We (1) problems: both to solutions present we thesis, this In (1) services: expert with tasks multi-step solving for problems distinct two perceive We Vi- Computer or (NLP) Processing Language Natural in occur frequently tasks Multi-step v

0

eta ela grsi hstei aeprl enpbihdi h olwn papers: following the in published been partly have thesis this in figures as well as Text Publications ytm,AMS1,SoPuo Brazil Paulo, São AAMAS'17, ad- Systems, expert multi-step for In learning Reinforcement vice. Rettinger. Achim and Philipp Patrick Modeling and for In [114] Interventions 2016. pipeline Robotic tools surgery. Cognitive Procedures, Maleshkova, valve Image-Guided Maria Heuveline. mitral Wolf, Vincent of Ivo and assistance Studer, Simone, Rudi De Volovyk, Rettinger, Raffaele Mykola Achim Nolden, Engelhardt, Sandy Marco Weller, Fetzer, Tobias Andreas Philipp, Patrick Schoch, Nicolai [102] y. algorithms. Surgery assistance and Studer. Radiology medical sisted Rudi of and Anna-Laura pipelines Müller-Stich, Götz, P. Nolden, cognitive Beat Michael Toward Marco Kenngott, Weber, Hannes Kämpgen, Dillmann, Christian Benedikt Rüdiger Wekerle, Katic, Speidel, Stefanie Darko Rettinger, Maleshkova, Achim Maria Philipp, Patrick Netherlands The [101] doi:10.1007/978-3-319-19890-3_25. Rotterdam, 2015. semantic ICWE'15, 409, A Conference, International Katic. 15th Darko - In and making. decision Rettinger, sequential Achim for framework Maleshkova, Maria Philipp, Patrick del Riva (ISWC'14), Conference Web Semantic Italy In International Garda, (COLD'14) images. 13th Data medical the Linked of Consuming with pre-processing on co-located Workshop the International automating 5th Us- for the apis of Rettinger. Proceedings web Achim and and We- data Maier-Hein, Christian linked Klaus Götz, ing Zelzer, Michael Sascha Kämpgen, Philipp, Benedikt Patrick ber, Maleshkova, Maria Gemmeke, Philipp rceig fte1t ofrneo uooosAet n Multiagent and Agents Autonomous on Conference 16th the of Proceedings ae 53,21.[51] 2014. 25–36, pages , 19:7315,21.di 10.1007/s11548-015-1322- doi: 2016. 11(9):1743–1753, , ae 6–7,21.[100] 2017. 962–971, pages , rceig fSI,MdclImaging'16: Medical SPIE, of Proceedings niern h e nteBgDt Era Data Big the in Web the Engineering n.J optrAs- Computer J. Int. ae 9786–9794, pages , ae 392– pages , vii

0 Patrick Philipp, Achim Rettinger, and Maria Maleshkova. On automating de- centralized multi-step service combination. In IEEE International Conference on Web Services, ICWS'17, Honolulu, HI, USA, pages 736–743, 2017. doi: 10.1109/ICWS.2017.89. [104]

Patrick Philipp, Maria Maleshkova, Achim Rettinger, and Darko Katic. A semantic framework for sequential decision making. J. Web Eng., 16(5&6):471–504, 2017. [103]

viii 23 17 1 3 Preliminaries 3 Scenarios 2 Overview & Introduction 1 Foundations I Contents 44 ...... 34 ...... 23 ...... 27 ...... Web . Semantic ...... 3.4 ...... 19 Decision-Making ...... 3.3 ...... 18 ...... 15 ...... Learning . . . Machine . Supervised . . . 13 . 12 . . . Tasks . 3.2 Multi-Step ...... 3.1 . . . . . 17 ...... 7 ...... 3 ...... Mapping . . . . Progression 10 . . Tumor ...... 2.3 ...... Recognition . . . . . Phase . . Surgical ...... 2.2 ...... Disambiguation . . & . . . . Recognition . . Entity . . . . Named ...... 2.1 ...... Outline . . . Thesis . . . . the of . . . 1.6 . Scope ...... 1.5 Contributions . . Hypotheses & . . Questions 1.4 Research . . . . . 1.3 ...... Challenges . . 1.2 Introduction 1.1 45 . 42 38 ...... 35 ...... 33 . . 30 ...... 30 29 ...... 34 ...... Framework . . . Description . . 28 . . . Resource ...... 20 . . . . . 3.4.1 ...... Processes . . . . Decision . . . . 19 . Multiagent ...... Processes . . . . Decision . . . . Markov . . 3.3.3 . . . . . Advice . . . . Expert . . . . with 19 . . . . Prediction . 3.3.2 ...... 18 . . 3.3.1 . Models . . . over . . . . . Functions . . . Combination . . . . . Learning ...... Models . . . . 18 Learned . . Evaluating . . . 3.2.6 ...... Scenarios . . . . Learning . . . 17 . Selected . 3.2.5 ...... Settings ...... Prediction . 3.2.4 ...... Protocols . . . Learning . 3.2.3 ...... Assumptions . . . Model-specific . 3.2.2 ...... 3.2.1 ...... 9 7 . . . Challenges ...... Description 2.3.2 ...... 2.3.1 ...... Challenges ...... Description 2.2.2 ...... 2.2.1 ...... Challenges ...... Description 2.1.2 . . . . . 2.1.1 ...... Automation . Learning 1.2.2 1.2.1 ix

0 Contents

3.4.2 RDF Vocabularies...... 45 3.4.3 Notation3...... 46 3.4.4 SPARQL...... 46 3.4.5 Linked Data...... 47 3.4.6 Services in the Semantic Web...... 48

4 Related Work 51 4.1 Multi-Step Tasks in Single-Agent Systems...... 51 4.1.1 Learning- & Decision-Theoretic Approaches...... 51 4.1.2 Service Approaches...... 53 4.2 Multi-Step Tasks in Multiagent Systems...... 53 4.2.1 Multiagent Systems for Single-Agent Tasks...... 53 4.2.2 Multiagent Systems for Multiagent Tasks...... 54 4.3 Multi-Step Tasks on the Web...... 55 4.3.1 Semantic Web Service Description Languages...... 55 4.3.2 Decision-Making Frameworks & Applications...... 55 4.3.3 Workflow Systems...... 56

II Learning 59

5 Learning with Expert Processes 61 5.1 Introduction...... 61 5.1.1 Challenges...... 63 5.1.2 Contributions...... 64 5.2 Problem Formalization...... 64 5.3 Meta Dependencies...... 68 5.3.1 Single Experts...... 70 5.3.2 Pairwise Intra-Step Experts...... 71 5.3.3 Pairwise Inter-Step Experts...... 71 5.4 Online Reinforcement Learning...... 74 5.4.1 EWH with Meta Dependencies...... 74 5.4.2 EWH with Incomplete Information...... 75 5.5 Batch Reinforcement Learning...... 77 5.5.1 Probabilistic Soft Logic and Hinge-Loss Markov Random Fields... 78 5.5.2 Meta Dependencies with PSL...... 79 5.6 Discussion...... 81 5.7 Summary...... 81

6 Learning with Multiagent Expert Processes 83 6.1 Introduction...... 83 6.1.1 Challenges...... 85 6.1.2 Contributions...... 85 6.2 Problem Formalization...... 86 6.3 Expert Weight Learning for MEPs...... 92

x 117 119 Processes Expert Semantic Automating 8 Automation Web III 103 MEPs & EPs with Learning of Evaluation 7 121 . 119 126 . 132 ...... 116 . . . . Components . . . Meta . Semantic . . . with . Tasks . . . Semantic . Solving ...... 8.4 ...... Automation . . for . Components . Web . . . . Formalization . . 8.3 Problem . . . . 8.2 . . 112 ...... Introduction . . . 109 . 8.1 ...... 104 ...... Summary ...... 7.5 ...... 103 ...... Discussion ...... 7.4 ...... Results . . . . 7.3 ...... Setup . . 7.2 Introduction 7.1 101 . . 101 ...... 93 ...... 96 . . . 94 ...... Summary ...... 6.8 Discussion ...... 6.7 ...... Learning . . Reinforcement with . Coordination Rules Active . Update Heuristic . with Coordination 6.6 . Active MEPs with Coordination 6.5 Passive 6.4 133 . . . 127 . . . 130 . . 129 ...... 127 . . 121 . . 120 ...... Planners . . . Semantic . . . Abstract ...... 8.4.1 . . . . . Agent . . Advice . . . Expert . . Semantic ...... Components . Meta . . . Semantic 114 8.3.4 . 115 . . . . . Experts . . Semantic . . 8.3.3 114 . . . . 114 Base . . Knowledge . . . Structured . 8.3.2 ...... 8.3.1 ...... Contributions ...... Challenges . 8.1.2 . . . 113 . . . 112 . . 113 8.1.1 ...... 113 ...... 105 ...... 104 ...... 106 ...... Choices . . . Design ...... ACoordL . . 7.4.8 . . . 104 ...... 111 . . . ACoordH . 7.4.7 ...... 109 ...... PCoord . 7.4.6 . . . Dependencies . . . . . Meta . . . . of . . . . . Results . . . Qualitative . 7.4.5 ...... (EP-)BatchPSL-All 7.4.4 ...... (EP-)OnEWH-All 7.4.3 ...... (EP-)OnGD . 7.4.2 ...... Optimization . . . 7.4.1 . Outcome . for . . . . Results . of . Estimation . . Summary . Accuracy . for . . . Results . of . . . Summary 7.3.2 ...... 7.3.1 ...... measures . Evaluation ...... Experts . 7.2.4 . . Dependencies . Meta 7.2.3 . . Data 7.2.2 7.2.1 99 . . 98 . . 97 ...... Decisions . Agent ECP Robust . the for . RL Search . Policy 6.6.3 Dependencies Meta Agent 6.6.2 6.6.1 Contents xi

0 Contents

8.4.2 Grounded Semantic Learners...... 134 8.4.3 Grounded Semantic Planners...... 134 8.5 Discussion...... 134 8.5.1 On the Added Value of SEPs...... 134 8.5.2 On The Goal of an Self-Adaptive System...... 135 8.6 Summary...... 135

9 Applications & Evaluations of Semantic Expert Processes 137 9.1 Introduction...... 137 9.2 Medical Assistance Scenarios...... 138 9.2.1 Web Automation for Medical Assistance Tasks...... 138 9.2.2 Semantic Tumor Progression Mapping...... 139 9.2.3 Semantic Surgical Phase Recognition...... 156 9.3 The Named Entity Recognition & -Disambiguation Scenario...... 160 9.3.1 Semantic Descriptions for Semantic NER & NED Experts...... 160 9.3.2 A Grounded Semantic Planner for NERD...... 163 9.3.3 Evaluation...... 168 9.3.4 Setup...... 168 9.3.5 Results...... 168 9.4 Discussion...... 169 9.4.1 On the Generalizability of Semantic Meta Components...... 169 9.4.2 On Domain-dependent Impacts of Rewards...... 170 9.5 Summary...... 170

IV Conclusion 173

10 Summary 175

11 Future Work 177

Bibliography 181

List of Figures 195

List of Tables 197

List of Abbreviations 199

A Appendix: Semantics 203 A.1 Semantic Experts...... 203 A.1.1 Brain Mask Generation...... 203 A.1.2 Standard Normalization...... 204 A.1.3 Robust Normalization...... 205 A.1.4 Registration...... 206 A.1.5 Tumor Segmentation...... 207

xii 213 ...... Agent Semantic A.2 215 . . . . . 214 ...... 208 . . 212 211 ...... 209 ...... Planning . . . . Semantic . . . . . Execution . . Expert . A.2.2 . Semantic ...... A.2.1 . . . . . Disambiguation . . Entity . Named . . . Recognition . . Entity A.1.9 Named . . Recognition Phase A.1.8 . Surgical Generation A.1.7 Map A.1.6 Contents xiii

0

Part I

Foundations

This part consists of the foundations of the thesis. We first introduce the topic of multi- step tasks with expert advice (Chapter1) by motivating the latter and defining our research questions, hypotheses and contributions. After presenting the application scenarios used in this thesis in Chapter2, we explain and define the preliminaries needed for the subsequent content (Chapter3). We finally discuss related works with respect to the contributions of this thesis (Chapter4).

oeal ietacs.T hsed evc rvdrpafrssc as such users: platforms end provider for service advantages are end, several this has fields To of exemplary access. goal direct - the enable to Web pro- to adhere their the publish to on researchers code Here, as available (NLP). totypes Processing made Language is Natural or algorithms Vision Computer of amount increasing An Introduction 1.1 chapters and parts all of outline which an give out finally pointing We specifically 1.6. neglect. by Section we 1.5) in which (Section and thesis on Based the focus of 1.4). we scope aspects (Section the contributions define respective we latter, the Sec- questions the summarize research on in we our challenges introduces which involved 1.3 after Section the hypotheses, automation. highlight and and then learning We to respect 1.1 . with 1.2 Section tion in setting the motivating first In Overview & Introduction 1 Chapter i ies nefcsadhvn utmr a o igeeeuin.Ti rn spce pb nunilIT influential by up picked is trend This executions. Algorithmia funding single (by for Google pay as customers such companies, having and interfaces diverse via hschapter this 1 Eduescnreuse can users End • Nwagrtm en eeoe o existing for developed being are algorithms New • which algorithms, of modularity increasing an is platforms service of effect side A • https://algorithmia.com/ mrvmn.Ti eut nst fecagal loihs hr loihscom- algorithms where algorithms, exchangeable of sets in results This improvement. researchers consequence, a properties. such As hyperparameters foster exposed to applications. incentivized of into are integrated numbers developers and and higher adapted as better well be as can with modularity Services of applications. degrees heterogeneous higher for combinations and compositions enables Art. the of than State rather the problems reengineering unsolved in on time focus of to abundance users an end spending enables This scratch. from lutions eitouetepolmo ovn ut-tptsswt xetavc by advice expert with tasks multi-step solving of problem the introduce we , State-of-the-Art nbet oeiesc loihsb aigte aiyaccessible easily them making by algorithms such monetize to enable erdcberesearch reproducible loihsisedo en ocdt eeo so- develop to forced being of instead algorithms 2 ( cesdo 05/01/2018 on accessed tasks fteei ufiin omfor room sufficient is there if , rwa hma e services Web as them wrap or ) Algorithmia 1 ,which ), 3

1 Chapter 1 Introduction & Overview

pletely or partially dominate others for instances of the task. This drives the develop- ment of better approaches, as dominant algorithms are used more frequently and yield higher revenues.

• Hosting algorithms as Web services is a reliable access mechanism for end users, lead- ing the latter to directly integrate respective services into their applications. By mone- tizing algorithm access, fast and dependable executions of the respective services have to be ensured. As this is especially challenging for algorithms with high computational complexity (i.e. the execution requires large amounts of computational resources), end users profit from outsourcing their executions.

• In addition to reliability, service platforms often provide a wide-range of well-known interfaces to available algorithms, e.g. directly via HTTP methods or via wrappers for common programming languages such as Java, Python and many more. Providing heterogeneous interfaces is a central requirement, as end users are inhibited to use such platforms if efforts for algorithm access and integration are too high.

Real-world applications often approach to solve multi-step tasks, i.e. tasks which require to solve a number of steps and where end users need to compose (or pipeline) different algorithms to generate hypotheses for task instances.

Example 1.1 (A Multi-Step NLP Task). Take, for example, the task of structuring text on the Web as available on social media or in news articles. Given piece of text (i.e. a task instance), one needs algorithms to find named entities (step Named Entity Recognition (NER)), algorithms to find relations among the latter (step Relation Extraction) and algorithms to link named entities and relations to knowledge bases (KBs) such as Wikipedia 3, DBpedia [85], Yago [125] or Wikidata [138] (step Named Entity Disambiguation (NED) and step Relation Disambiguation). The sequence of algorithms generates structured pieces of texts (i.e. task solutions).

Figure 1.1 illustrates such tasks, comprising a number of steps (variable H) which need to be sequentially solved for a task instance to generate a task solution. A number of algorithms (variable N) available in an arbitrary Web repository have to be used to solve the respective steps. Each algorithm needs a set of inputs in a specified format to generate a solution for a task instance. When trying to find well-performing algorithms for a multi-step task, multiple options (i.e. exchangeable algorithms) are often available for a single step. As a result, numerous algorithm pipelines might be possible to configure, where pipelines solve the same steps but potentially produce different results for task instances. For a sufficiently challenging task, there is a high probability that the set of all possible pipelines produce conflicting hypotheses for at least a subset of task instances. In addition to disagreements among pipelines, it is probable that

3https://en.wikipedia.org/wiki/Main_Page (accessed on 05/01/2018)

4 ec siae o h rpsdsltos hc r aet eestimated. be to with have labeled are are which algorithms solutions, to proposed unknown however, steps the is, for from instance estimate) Arrows task dence a assessed. for be algorithm to an has of and correctness The known. be to sumed as avail- multi-step latter of for the subset research to a refer in execute We gap to step). able a a being learning is for only there for algorithms (e.g. able 149], advances constraints , 27 are additional [3, there with especially tasks, (e.g. their tasks, single-step algorithms compare to combine for and and While, set assess training to label. the respective in the instances to task on outputs algorithms execute contain to entails sets cedure Training as to instances. referred task (also new tuples to algorithm an of rectness learning chine different for pipelines different choose to sensible instances good is produce task it pipelines instances, that task Assuming different others. for for hypotheses solutions wrong produces but instances task single a not is there cud o xml,b iie notosalrses hr etis text for descriptions where textual steps, end, this smaller To two processed. further into then divided and be step dedicated example, The a functionalities. for in overlapping tokenized could, wrap algorithms NER for services step as usually mentioned task, task the multi-step solve and a might steps solve steps these on of to for based steps algorithms found needed set of be The the set to have the find algorithms. discover to chosen task, able the multi-step architecture execute a Web correctly solve a to requires steps however, possible This, of possible. becomes tasks step ie htagrtm r vial sWbservices, Web as available are algorithms that Given as- are algorithms available and steps where problem, learning the illustrates 1.2 Figure hoigaogecagal loihscnb eeal prahdby approached generally be can algorithms exchangeable among Choosing Gnrlshm o sn loihst ov ut-tptasks. multi-step solve to algorithms using for schema General 1.1: Figure omxmz h ubro orc ouin o h task the for solutions correct of number the maximize to (short correct uevsdlearning supervised samples eta descriptions textual ieiefralts ntne,ie ieiewrswl o some for well works pipeline a i.e. instances, task all for pipeline ,weetelbli h orc ouin h eea pro- general The solution. correct the is label the where ), .Oeexploits One ). and taxonomies erigproblem learning automatically riigsets training ee ifrn compositions different Here, . . < . ovn ies multi- diverse solving ognrlz h cor- the generalize to akisac,label instance, task weights . Introduction 1.1 uevsdma- supervised ie confi- (i.e. > 5

1 Chapter 1 Introduction & Overview

Figure 1.2: Learning problem of multi-step tasks using algorithms to solve steps. algorithm services significantly complicate their execution, as required inputs and available parameters have to be correctly interpreted. While there is a large body of works with respect to service discovery, -selection, -composition and orchestration, the literature lacks the needed decision-theoretic focus to work towards Web automation for multi-step tasks using algorithm services. More specifically, it is essential to automate solving the learning problem in Web environments, while keeping the architecture flexible for novel algorithms and tasks. We refer to the latter as automation problem. Figure 1.3 illustrates the automation problem, where the needed steps of the task are un- known, and algorithm inputs and outputs need to be interpreted. Throughout this thesis, we refer to such algorithms as experts, as the latter generate spe- cialized hypotheses for a class of task instances. We thereby align our terminology with the seminal work on prediction with expert advice [27], where a decision-maker has to learn how to use the advice (i.e. the hypotheses for task instances) of experts for a task with a single step. Figure 1.4 summarizes our general approach to the learning and the automation problem in terms of a feedback loop of an agent, having to choose experts to solve a multi-step task. The figure illustrates the loop for a single step (variable h, where h ∈ 1...,H). As we argue and show throughout this thesis, Markov Decision Processes (MDPs) [109] are an adequate framework for both problems. MDPs enable to model sequential decision-making problems, where a state (variable sh) requires to take a single action (variable ah), which yields a novel state (variable sh+1) and a numerical reward (variable rh). Reinforcement Learning (RL) is a paradigm for learning to act in MDPs, which we use to solve the learning problem. For the automation problem, we additionally exploit Semantic Web Technologies to flexibly describe, discover and execute expert services. By Planning in a MDP, which returns the best action for a state after the learning problem is solved, we can automatically compose experts to solve

6 egsfralann ytmaecentral: are system learning a for lenges the choose To Learning 1.2.1 execute correctly problem). and automation find the to (i.e. how individual task with for a deal hypotheses then for and expert services problem) expert of learning correctness the the (i.e. assessing instances task for challenges discuss first We Challenges 1.2 tasks. multi-step enwdsrb h epciecalne ndetail. in challenges respective the describe now We Qeyn utpeecagal xet savnaeu,a aigteaeaeo their of average the taking as advantageous, is experts exchangeable multiple Querying 1. ntne,oelssifrainaotterotusadnest elwt resulting with deal to needs and task outputs individual their for information. experts about missing of information executions the loses execute limiting one to By not instances, perspective) step. cost monetary as- a a use for by from experts platforms vote (e.g. available service sensible majority all is as their it However, optimize access, expert to experts. for experts different models of to deal sets weights both [149] or different learning single signing ensemble assess and to [27] learning advice expert with promi- with the specifically, prediction More of experts). available fields the nent to respect (with instances task of ber to (referred hypotheses Atmto rbe o ovn ut-tptssuigalgorithms. using tasks multi-step solving for problem Automation 1.3: Figure best xetppln o akisac,tefloigosrain n chal- and observations following the instance, task a for pipeline expert aoiyvote majority fe eut ncretsltosfralrenum- large a for solutions correct in results often ) . Challenges 1.2 7

1 Chapter 1 Introduction & Overview

Figure 1.4: General approach for solving multi-step tasks with experts.

2. A possible approach for choosing experts and combining their outputs is to ignore that the task has multiple steps. This, however, does not favor the solution for the task, as future experts might perform better for inexact (i.e. semi-correct) in- puts. For the mentioned task of structuring Web text, for example, an expert for the NED step might correctly link Michael Jordan plays to resource http:// dbpedia.org/page/Michael_Jordan but wrongly link Michael Jordan to http://dbpedia.org/page/Michael_I._Jordan, as it confuses the bas- ketball player with the computer science researcher – where Michael Jordan plays is not a named entity but Michael Jordan is. Expert pipelines constructed based on local optimization (i.e. only for individual steps) thus often fail to find global optima for multi-step tasks.

3. Supervised learning often assumes training sets to be available in a batch, where all task instances and labels are available prior to learning. For multi-step tasks with experts, we have to generate novel, expert-specific training sets by executing experts and comparing their hypotheses with the labeled task instances of task training sets. To use batch learning for expert-specific training sets, one has to either (i) randomly execute experts and randomly choose step solutions, where wrong solutions propagate to later steps or (ii) try to execute experts of all steps with correct inputs and artificially create labels for all steps. Both options introduce a learning bias, as generated expert-specific training sets are not representative of the joint performance of all experts involved in the task (i.e. how often their eventually generated solutions are correct).

8 n ops xetsrie o ut-tpts.W o ics h novdcalne in challenges involved the discuss now execute We find, task. to multi-step involves detail. a automation for Web services hypotheses, expert expert compose assess and to learning to addition In Automation 1.2.2 Ee fipt n upt fepr evcsaeaindadsfcetyintegrated, sufficiently and aligned are services expert of outputs and inputs if Even 8. Atrhvn icvrdeprsadesrdtercreteeuin hr fe sthe is often there execution, correct their ensured and experts discovered having After 7. provide usually services Expert 6. We obnn rpsdsltoso utpeecagal xet,oehst ad- to has one experts, exchangeable multiple of solutions proposed combining When 5. re- The instances. task few relatively comprise often tasks for sets training Available 4. ae niis hl nuigta eetdnmdette ontoelp naddition, In overlap. not do entities annotated named of selected number that undefined ensuring an select while to entities, needs named one example, for service, expert NED atta tp fe edvrosipt,weempigaalbedt ftets in- task the of be data to available has the services mapping to expert where of due descriptions inputs, is input various to latter stance need The often steps challenging. that remains fact services expert executing automatically code. integrate glue automatically write cannot manually often to one has ex- consequence, and for experts a formats discovered As correct newly to step. step subsequent a a of of outputs expert perts generated transform to order in inputs) edt eeo additional develop to need consult to requires often latter The formats required converted. as and perts difficult, interpreted systems usually be application is to or which need users description, end the that parame- presupposes understand and thus sufficiently inputs services needed the expert about Executing explanations ters. as about well information textual as of functionality consists available often the description A them. execute and discover can is vote terms majority in weighted correlate the only instances, experts task shared two improved. for If solutions over- instances. correct to solutions) task proposing for new of factor for influence solutions as their weights the expert estimate causes learned which with vote instances, majority task the shared of number large a iinlyda with deal ditionally small, are high-dimensional. sets be training not When must instances instance. task task for across representations a generalize feature of to characteristics learning summarizes a For which define tor to experts). entails to combining learning needs supervised and instances, process choosing task for learning well the work that to is learning supervised converge using for challenge sulting seilywe akdmisaeseilzd(..tssfrmdclassistance). medical for tasks (e.g. specialized are domains task when especially , ufiinl at(..frmlise ak,tersligepr egt have weights expert resulting the tasks, multi-step for (i.e. fast sufficiently xetcorrelation expert lecode glue descriptions ie ovrinmcaim rmdt oservice to data from mechanisms conversion (i.e. w xet ih rps rn ouin for solutions wrong propose might experts Two . uhta n sr rapiainsystems application or users end that such , etr representation feature task-oriented egtdmjrt vote majority weighted hneeuiga executing When . . Challenges 1.2 ..avec- a e.g. , oanex- domain (i.e. 9

1 Chapter 1 Introduction & Overview

one has to take into account that NED experts might make mistakes for correct named entities.

9. There might be an expert class to integrate the expert into a taxonomy. While the latter could be helpful to map experts to steps of the current task, exclusively using it often results in missing possible expert pipelines. This is due to the fact that steps can often be further partitioned, which is hard to automatically (or even manually) discover.

10. As there often exist a number of exchangeable experts for a single step, assessing and weighting them is important to find correct solutions for a multi-step task (as stated for the learning problem). One thus has to integrate adequate learning approaches into a Web architecture. Learning approaches might, however, be tailored to a specific step or outperformed by newer learning algorithms. Fixed procedural workflows thus do not suffice to sustainably solve multi-step tasks, as they lack flexibility.

Based on the challenges for learning and automation with respect to multi-step tasks, we now stipulate our research questions and hypotheses for the contributions of this thesis.

1.3 Research Questions & Hypotheses

We first describe our hypotheses and respective research questions for learning about the per- formance of experts in a supervised setting. All hypotheses and research questions assume a multi-step task with sets of experts for each step, where for at least one step the respective expert set comprises more than one expert. We begin by dealing with learning challenges 1-3 in Hypothesis1.

Hypothesis 1. By learning expert weights for individual task instances which express an ex- pert’s ability to support generating the correct task solution, and by choosing experts with respect to the exploration-exploitation problem, we can maximize the number of correct solu- tions for multi-step tasks.

The hypothesis tackles Research Question1:

Research Question 1. How can we formalize and model the problem of solving multi-step tasks assuming the prevalence of experts for predefined steps and expert execution budgets, where we have access to feedback in terms of training sets for the respective task?

Hypothesis1 thus proposes the basic properties of a general framework for weighting experts for multi-step tasks. To this end, it is central to study when to exploit experts with high weight estimates (i.e. when to choose a promising expert) and when to explore experts with low or unknown expert weights (i.e. when to choose an expert with low or unknown expectations). The framework is the baseline for all residual learning- and automation challenges. By additionally tackling challenge 4, we make Hypothesis2:

10 mlmnain n itrae o ovn ut-tptsso h Web? the on tasks multi-step solving for -interfaces and implementations 4. Question Research 4: Question execute Research and with find associated is to hypothesis sufficient The is services the execute to methods services. HTTP expert exploiting (RDF) as Language Description well Resource as the with modeled annotations lightweight with stances 7. and 6 challenges 4. approaches Hypothesis 4 Hypothesis solving Web. automatically the of on problem experts the with for tasks hypotheses multi-step and questions research with deal now to We the increasing helps also system. process while the coordination weights, of expert resulting flexibility learned (MAS), The from System resulting Multiagent hypotheses studied. a expert to be decorrelate framework can learning the approaches extend coordination to where is 3 Hypothesis of goal The tasks? multi-step for solutions correct of number the maximize 3. Question Research 3: Question multi-step Research for to connected solutions is correct 3 Hypothesis of number flexible. the system resulting improve the can keeping while latter tasks, the solving and problem 3. Hypothesis 3: Hypothesis make we 5, challenge tackling exploration Finally, experts. their representa- and specif- feature balance instances diverse it to task integrate as how of compactly problem, and to tions this enable actions features hypotheses for of Relational suited their impacts exploitation. especially that future and be estimate account task to to respective into deemed how take a is studies to for ically RL has executed and not correct. labels been are have their experts might po- receive several global not As expert’s does the task. one expressing the instance, while Expert solving instance, task tasks. support multi-step each to for for tential weights computed expert be learn to to need approach weights we how describes 2 Hypothesis tasks? multi-step solve correctly to budget execution expert 2. Question Research 2: Question Research to corresponds with 2 tasks Hypothesis multi-step for solutions correct of number the advice. maximize expert to order in budgets expert 2. Hypothesis ie ut-tpts ihaalbeeprs tutrn xet n akin- task and experts structuring experts, available with task multi-step a Given uigrltoa etrsfridvda akisacsealst elwith deal to enables instances task individual for features relational using RL rmn xetassmn o ut-tptssa utaetcoordination Multiagent as tasks multi-step for assessment expert Framing o a etk noacuttecreaino xet nodrto order in experts of correlation the account into take we can How o a elanwiheprst hoei hc tp ie an given steps which in choose to experts which learn we can How o a eacutfrhtrgnosepr ucinlte,- functionalities, expert heterogeneous for account we can How . eerhQetos&Hypotheses & Questions Research 1.3 11

1 Chapter 1 Introduction & Overview

Hypothesis4 deals with the problem of structuring textual service descriptions, as the latter are both cumbersome and difficult to understand for end users and machines. Semantically enriched expert services are a first step towards automating their execution. Hypothesis5 tackles challenges 8-10. Hypothesis 5. Expert services lifted with concepts of the Semantic Web can be executed on a data-driven basis, where decision-theoretic models enable planning their execution as well as integrating expert weight learning approaches to solve multi-step tasks. The hypothesis is associated with Research Question5: Research Question 5. How can the needed set of expert services for a multi-step task be discovered, executed, composed and weighted to solve multi-step tasks with expert advice? Hypothesis5 deals with generating task-oriented solutions using expert weight learning tech- niques for multi-step tasks. By using adequate decision-theoretic models to integrate learning and planning techniques and by using a data-driven approach for expert execution, we can achieve automatic compositions of experts.

1.4 Contributions

We now summarize the contributions for the respective hypotheses. Contribution 1. A decision-theoretic model merging MDPs and prediction with expert advice – referred to as Expert Processes (EPs) – to enable the reuse and further development ofRL algorithms. The contribution to Hypothesis1 comprisesEPs, which are a general purpose framework for multi-step decision making with expert advice. AsEPs are a specialization of MDPs, one can easily reuseRL algorithms and only has to find adequate feature spaces for the expert weight learning problem.EPs are introduced and explained in Chapter5. Contribution 2. RL algorithms using relational pairwise expert feature spaces, which enable to learn expert weights to maximize the number of correct task solutions. We contribute to Hypothesis2 by presenting twoRL solutions for the expert weight learning problem. The feature spaces comprise diverse relational pairwise expert measures (referred to as meta dependencies), which are calculated for experts of same- as well as subsequent steps. Meta dependencies for experts of subsequent steps account for an expert’s ability to support solving the task. We present meta dependencies as well asRL algorithms in Chapter5 after having definedEPs. Contribution 3. Multiagent coordination protocols for Multiagent Expert Processes (MEPs) (i.e.EPs framed as MAS) with appropriate agent coordination techniques and feature spaces, which enable to decorrelate expert weights while keeping the resulting system flexible.

12 n xet fe orlt ngnrtn rn ouin,sc supin onthold. not do assumptions such solutions, wrong distributions generating task data in of different correlate series across often a vary experts on significantly and correctness often average performances or expert minimal- As their instances). (e.g. experts of performance exclude availability general we the end, assume this thus To learning and supervised tasks. task learning a for supervised for sets experts by training of problem subset of a the execute approach only We can one instance. i.e. constraints, high- budget by possible under and tasks approach we problems main goals. the our confining revisiting as by well thesis as the lighting of scope the define We Thesis the of Scope Chap- 1.5 in tasks assistance medical two specific as present well and as 8 task Chapter NERD in 9. the components ter to meta latter semantic the of of concept applications the as given well a as for experts SEPs adequate SEPs find and EPs. weight- of to liftings rithms semantic develop are to which baseline SEP s, the comprises are 5 tasks. Hypothesis assistance task for medical contribution as individual Our well for as experts NLP to semantic applicable adequate are learn- weight methods EP The and and planners instances. compose MDP execute, extending discover, and to reusing methods ers and – (SEPs) Processes Expert NERD- time-efficiency. to their 5. experts evaluate Contribution also semantic and of 9 concept describe Chapter the in and apply experts experts We assistance semantic medical services 8. and Web Chapter define in lightweight formally components We are needed which their vocabularies. experts, structured semantic with to annotated mechanisms) execution and tions by 4 Hypothesis to contribute We tasks. assistance medical as to well flexible as and time-efficient NLP- sufficiently selected is solve of which descriptions, task service NLP lightweight and the (LAPIs) for 7 Chapter 4. in Contribution 3 and 2 -Disambiguation 1, protocols. and contributions coordination Recognition evaluate two Entity for we approaches end, for- coordination the this three comprises them as To 6 coordinating well Chapter by as correlation. MEPs weights their of reduce expert to malization adapting order in for agents methods cooperative other present Multia- with We novel with tasks. study s MEP multi-step to of for enable formalization which the protocols, comprises coordination gent 3 Hypothesis for contribution Our h rtmi rbe eda ihi erigaotepr efracsfrmulti-step for performances expert about learning is with deal we problem main first The itn ehns rmeprst eatceprs sn ikdAPIs Linked using experts, semantic to experts from mechanism lifting A faeok–yedn Semantic yielding – framework EP the into semantics of integration The ehd rmorsoe stelte aesrn supin nthe on assumptions strong make latter the as scope, our from methods ) eatcmt components meta semantic lifting xetsrie ie digsrcuet hi descrip- their to structure adding (i.e. services expert NED. and NER steps with (NERD) nuevsdmcielearning machine unsupervised decentralized hc r erigadpann algo- planning and learning are which , eatcmlise task multi-step semantic ut-tpdecision-making multi-step . cp fteThesis the of Scope 1.5 eintroduce We . (short Named un- 13

1 Chapter 1 Introduction & Overview

In addition to unsupervised learning, we also exclude the theoretical or empirical study of convergence (also referred to as sample complexity) of our learning approaches, which is prominent for evaluatingRL algorithms.RL algorithms are used in this thesis, as they can be applied to the problem of weighting and choosing experts under budget constraints. Our overall goal for the learning problem is to maximize the number of correct solutions for multi-step tasks and we thus assume that, if the respective learning approaches have converged sufficiently fast given the available training sets, the number of correct solutions is maximized. Our last limitation for the scope of the learning problem deals with the sub-problem of re- ducing expert correlation. We cast the problem as MAS and develop techniques to coordinate available hypotheses. However, central problems of interest in MAS coordination deal with investigating constrained communication cases, where agents cannot retrieve the global state. In this thesis, we do not deal with constrained communication MAS, as such a constraint only makes sense if the respective learning system is distributed (and thus decentralized). The essential difference of our goal is that we are only interested in exploiting the coordination process among experts, which introduces novel perspectives on their weights and resulting hypotheses. To this end, we again do not investigate the convergence behavior of our expert coordination techniques, where our argumentation remains the same as before, i.e. if our methods converged, the number of correct solutions for the respective multi-step task with expert advice is maximized. The second main problem approached in this thesis is automatically solving multi-step tasks on the Web, where experts are available as Web services. Our goals are to automatically find adequate expert services for steps, derive plans which only comprise expert services needed for the overall multi-step task, enable automatic querying of the resulting expert services and, finally, integrate expert weight learning methods (as developed for the learning problem of this thesis). Our methods rely on advances in Semantic Web technologies for both structuring data as well as services, but we exclude methods around the topic of reasoning. While rea- soning is a central means to derive novel information, we aim to express (and explore dealing with) uncertainty about the predictive performance (i.e. estimated correctness) of an expert by means of supervised learning. More specifically, we argue that manually created structured annotations cannot compete with learned expert weight approximations based on supervised learning methods. We thus aim to integrate the latter into the automation approach. We also do not deal with reasoning with respect to non-matching semantic annotations for task instances and expert services, which could be successfully used to align them by ontology matching. We assume that a single centralized vocabulary is used for all experts and task instances. Finally, when lifting expert service descriptions to semantic representations and enabling to automatically execute expert services, we exclude problems with respect to security from our scope. The latter are certainly domain-dependent (in medicine, for example, security is highly important) and adequate authorization, encryption or even data anonymization approaches have to be used or developed. Such approaches might be highly complex and have to be used in addition to our developed methods.

14 enwotietecnet ftetei,cnitn ffu at – parts four of consisting thesis, the of Automation contents the outline now We Outline 1.6 EPs. solve to Web the on available on services expert using cally - III Part latter. the evaluate and them for methods learning appropriate propose - II Part In thesis. the in proposed methods novel the for - I Part with Proceeding ee edsigihbtenteaiiyt siaethe of estimate number the to maximize ability to tasks. the ability multi-step the for between solutions and distinguish task steps and correct we individual EPs for for Here, experts methods NERD. of learning performance task weight expert the proposed for the MEPs evaluate we chapter, this In - 7 Chapter process. decision novel a solving oeal h td of and study develop the we enable definition, to MEP s - the MEPs on in a Based for resulting methods present – correlation. MAS expert for reducing formalism flexibly EP the extend We - 6 Chapter RL two introduce to and way approximation general EPs. weight a for expert approaches propose for then We representations framework feature advice. purpose derive general expert a with is decision-making which multi-step EPs, for of formalization the comprises chapter The - 5 Chapter regards with thesis problem. this automation of contributions and the learning- for the works to related the with deals chapter The - 4 Chapter for needed technologies and thesis. concepts this baseline of the contributions the introduces and defines chapter The - 3 Chapter and challenges. domain instantiating respective for the thesis highlight and the approaches throughout proposed used our evaluation scenarios application the describe We - 2 Chapter Automation Learning and Conclusion MEPs & EPs with Learning of Evaluation erigwt utaetEpr Processes Expert Multiagent with Learning Processes Expert with Learning Work Related Preliminaries Scenarios epooetofaeok o sesn xet nmlise tasks, multi-step in experts assessing for frameworks two propose we , owr oad automati- towards work to EPs on perspective Web a taking by follows Foundations . passive- n a and eitouetefloigcatr osttebaseline the set to chapters following the introduce we , ciecodnto protocol coordination active Foundations hr h atrentail latter the where , . Outline 1.6 , Learning 15 ,

1 Chapter 1 Introduction & Overview

Chapter8- Automating Semantic Expert Processes We first extend theEP framework to SEPs, where state, action and expert spaces are lifted to semantic representations. We then propose a general Web architecture to solve SEPs, which we describe in detail.

Chapter9- Applications & Evaluations of Semantic Expert Processes The chapter comprises applications of SEPs with instantiations of the Web architecture for NERD and two medical assistance scenarios. We evaluate the respective applica- tions scenarios in terms of time-efficiency and automation capabilities for SEPs.

Finally, PartIV- Conclusion provides a retrospective overview of the content of the thesis and an outlook on future works.

Chapter 10- Summary We summarize the thesis with respect to the stipulated challenges, research questions, hypotheses and contributions.

Chapter 11- Outlook The final chapter gives an outlook on future research directions which result from our work onEPs, MEPs and SEPs for multi-step tasks with expert advice.

Throughout the thesis, we will mention the published research papers which correspond to the contributions of the respective chapters.

16 a eeautdadtse npbil vial riigst n xet,wihhv been have which [131]. GERBIL experts, benchmark and NERD sets Art training the available of publicly State by on integrated tested and evaluated be can as to modeled referred a often to is NER context, this In reduced absence. be the can to NER 0 i.e. to and NERD , sequences) entity solving token for (or decision tokens as binary mapping tagged a to (i.e. need entity only named we a NER, being traditional not as or organization) an or be to have tokens textual entities unique and Wikidata. mentions actually or Yago such is DBpedia, between entity Wikipedia, links as real-world find such which to KBs, important unclear in thus available remains is often It it to. ambiguous, referred as being be to referred can then text is in text of entity piece a in entity named (e.g. a of the instantiation on distinct available A as ways. text, unstructured in ambiguity resolving Web. of problem the with deals NERD Description 2.1.1 Disambiguation & Recognition Entity Named 2.1 to regards with challenges their advice. discuss expert then with and tasks tasks advice. multi-step respective namely expert the describe assistance, with first medical We tasks of NERD. multi-step field the of within Recognition problem scenarios Phase different automation two the with and deal We learning- the both for In Scenarios 2 Chapter ntr,rcie etanttdwt ae niisa nu n a oln h latter the link to has and input as entities named with annotated text receives turn, in NED, if decide to has one NER, In NED. and NER steps the of consists NERD specifically, More hschapter this ae entities Named #MichaelJackson resource epeetteseaisue oapyadeaut u rpsdmethods proposed our evaluate and apply to used scenarios the present we , and r elwrdojcswihcnb etal ersne nanme of number a in represented textually be can which objects real-world are NERD text. a in mention the disambiguates eventually which KB, a in uo rgeso Mapping Progression Tumor annotated o ae entity named for sdsic yeo ae niy(..apro,alocation a person, a (e.g. entity named of type distinct as { 0 , 1 ihe Jackson Michael } ih1rfrigt h rsneo named a of presence the to referring 1 with namely , NLP in scenario one and (TPM), NIL .Hwvr te hnin than other However, ). .A eto fan of mention a As ). eto detection mention Surgical mention . 17

2 Chapter 2 Scenarios

2.1.2 Challenges • We approach learning to assess the performance of experts with supervised learning and thus need an appropriate feature representation. Feature generation for text is challeng- ing since available training sets are small in size. Deep learning approaches [84], for example, need large numbers of samples to converge and are not directly applicable. To this end, one could reuse learned vector embeddings for words (e.g. [92, 98]), which are feature representations learned on large textual corpora in an unsupervised manner. While a vector embedding might generalize well, its availability for a specific word is strongly dependent on used corpus and its dimensionality might be high.

• The NERD task consists of two steps, where the overall performance is strongly depen- dent on NER. If a named entity is wrongly excluded by NER experts, the result cannot be correctly processed by any NED expert. Also, if tokens are wrongly annotated as named entity, the respective NERD task can only be correctly solved if NED experts do not link the tokens to a resource. The latter requires sufficient robustness of NED experts. Lastly, NED experts might better work with inexact inputs, where available named entities are not perfectly annotated.

• For different text distributions, such as tweets or news articles, NED and NER experts often significantly vary in their performance. As a consequence, not a single NER or NED expert usually dominates the rest for each available task instance.

• It is essential to interpret larger amounts of text (e.g. sentences or paragraphs) at once, as the latter contain contextual information needed for NER and NED. Numerous deci- sions for each token or token sequence then influence each other, which makes learning expert weights for NERD more difficult, as different experts often generate correct hy- potheses for different tokens in a single piece of text.

2.2 Surgical Phase Recognition

2.2.1 Description Surgical Phase Recognition (short Phase Recognition) is focus of active research for medical assistance systems [76]. Surgeons today are faced with a variety of intraoperative information sources. Vital signs, scans generated by Computed Tomography (CT) or Magnetic Resonance Imaging (MRI), or various device states are available at any point during the surgery. The problem is that only a small fraction of the available information is actually relevant in a given situation. Showing all information at once is thus distracting and potentially harmful, as the data will surpass human ability to process it [60]. In order to make full use of computerized surgery, we need methods to infer the current phase of the surgery to only present relevant information to the surgeon.

18 omdses hsi u otefc htgibatm rwirglryadmntrn is monitoring and irregularly these grow As glioblastoma physicians. per- spatially different that not by manually fact are taken usually headscans headscans the comparing complex, to visually and by due conducted tedious is generally several This requires steps. which formed patient, a of treatment h mgswr ipae n rcse ihthe with processed and displayed were images Toolkit The Interaction TPM. dimension) a horizontal (the and pa- taken single headscan was a scan the of time used MRIs) the the or to and respect CTs with (i.e. ordered are headscans which registered tient, spatially processed, of consists TPM of development the monitor to have Radiologists Description 2.3.1 Mapping Progression Tumor 2.3 Challenges 2.2.2 the the using provide created experts were clinical where annotations surgeries, The of sets. videos training annotated manually use we testing, using and recognized are activities these (e.g. ativey, terminology or inspected medical are standardized organs on where based labeled are and activities structure used the of consisting features i napoc ovsaietetml rgeso fbantmr o ailgss A radiologists. for tumors brain of progression timely the visualize to approach an is TPM a is recognition Phase Aalbetann esfrPaeRcgiinaesal sol oaneprsaeable are experts domain only as small, are Recognition Phase for sets training Available • experts, Available methods. diverse by approached be can Recognition Phase Surgical • efr ciiisi ifrn re.Ti ae sn aee ugre rmdifferent from surgeries weights. expert labeled and learning using instruments for makes different challenging This use surgeons generally order. They different significantly in surgeries. surgeons activities perform labelling, perform they to way addition the In in differ surgeries. recorded label confidently to expert of execution the automating for challenging is This experts their rule-based as as sets. services, differ, training experts need such for not on inputs ad- do build resulting exploiting or The by activity activities sets. current training of the ditional from sequences generalize for to approaches rules learning heuristic supervised predefined exploit example, for as eerdt as to referred (also .. ugclpae r endoe eune of sequences over defined are phases Surgical Gallbladder>. cut,

2 Chapter 2 Scenarios

Figure 2.1: An example of a MRI headscan. Figure 2.2: An example of a TPM.

The workflow for creating TPMs is illustrated in Figure 2.3. Available headscans are stored in a Picture Archiving and Communication System (PACS) and converted into a common for- mat, either Nearly Raw Raster Data (NRRD) 1 or MetaImage (MHA) 2.A brain mask for the brain region is created (step Brain Mask Generation), ensuring that subsequent steps are not negatively influenced by bones or other structures in the headscan. Step Standard Nor- malization adapts the intensities of MRI scans to ensure that similar tissue types have similar values. If brain masks with additional details have been manually created by radiologists, a more robust normalization can be achieved (step Robust Normalization). All normalized brain masks of a single patient now have to get spatially registered such that the same perspective is used for the eventual TPM (step Registration). Based on registered brain masks, the TPM can finally be created (step Map Generation). An optional step before generating the TPM is to segment tumors to further ease interpretation for radiologists (step Tumor Segmentation). For evaluating and testing TPM experts, we use MRI scans taken within different time steps of the treatment process of a patient.

2.3.2 Challenges • Given that a large number of MRI orCT scans are available for a single patient, correctly composing and executing TPM experts becomes difficult. This is due to the fact that numerous possible mappings result from all available headscans to input parameters of TPM experts. The Map Generation step, for example, requires all processed brain masks (which are needed for the final TPM) as input. If not all headscans should be part of the tpm (based on a constraint given by a radiologist), one has to test all brain mask combinations of lengths 1,...,|n| (with n the number of available headscans). Such brute force approaches are not tractable.

1http://teem.sourceforge.net/nrrd/format.html (accessed on 05/01/2018) 2https://itk.org/Wiki/MetaIO/Documentation (accessed on 05/01/2018)

20 ts ossso tlatfu tp wtototoa emnainse) which step), segmentation optional (without steps four least at of consists task TPM The • gi,aenttatbefrtsswt ag ubr fses soehst repeatedly to has one sequences. as correct for steps, test of to numbers experts large of with comparisons pairwise tasks perform for experts. tractable appropriate not discover) are again, (i.e. find efficiently to requires experts. TPM for Steps 2.3: Figure . uo rgeso Mapping Progression Tumor 2.3 rt force Brute approaches, 21

2

yohss)saeY h ouint aki ucinSolution function (or a output is x arbitrary task an instance a to input to B given solution space a The knowledge background Y. and space X hypothesis-) space input arbitrary an of hc a eotmzdmr fcetyadtasaety ihti nuto,w introduce we intuition, this With transparently. and efficiently more optimized be can which Y. to B and X from information generalize or derive to y instance as solution the returns 3.1 Definition thesis. this of notion our define first We Tasks Multi-Step 3.1 querying and services. creating Web for enrich to methods latter central the with using deal as well we as matter, vocabularies that structured For Web. Semantic the for methods proposed to our and of of to presentation experts aspects turn the for prepare selected To constraints discuss . 3.3 budget we Section in with steps, theory sequential deal multiple to having However, for account 3.2. of tokens Section is the in generalization (e.g. enable introduce instances to tively task means central of A characteristics the on sentence). with based a deal output expert’s first We an of thesis. correctness this and of concepts contributions existing the other explaining of present from as goal to well reused required as not are out and which pointing novel on methods, are Sec- focus tasks in sections multi-step tasks residual to of multi-step the related scope of works, definitions notion the the constrained clarify While To our Web. defining 3.1. the by tion on start tasks we multi-step methods, solve proposed and to our hypotheses, experts i.e. their executing goals, and combine specific finding and two of instances of task terms for in experts experts available choose with tasks multi-step solve to chapter This Preliminaries 3 Chapter ftssaecmlx ti esbet iiete nosalr oemanageable more smaller, into them divide to sensible is it complex, are tasks If e technologies Web learning ensbsccnet eddfrtecnrbtoso hstei.W approach We thesis. this of contributions the for needed concepts basic defines (Tasks) h ehd epeettruhu h hssamt eeaieteexpected the generalize to aim thesis the throughout present we methods The . . edfieats as task a define We ∈ seilyoe nbigmcieraaiiywithin machine-readability enabling ones especially , 3.4 Section in n usto h akrudknowledge background the of subset and X tasks ∈ ihrsett h rbesadcalne prahdin approached challenges and problems the to respect with .W,tu,asm htsligats xlsvl entails exclusively task a solving that assume thus, We, Y. 3 tpeTask -tuple ( = uevsdlearning supervised X , B , Y ) ..i em famapping a of terms in i.e. , : X automating × B ˆ ⊂ B hc eselec- we which , → ,Solution B, decision-making automation .Hne for Hence, Y. h process the learning ( steps x we , , 23 B to ˆ )

3 Chapter 3 Preliminaries multi-step tasks (and from now perceive tasks and multi-step tasks as being equal). As the step structure might get complex, we distinguish between two general types of multi-step tasks, namely layered- and non-layered multi-step tasks. A layered multi-step task assumes the simplest step structure in that – except for start- ing and ending steps – a step has exactly one predecessor and one successor. Although this property significantly reduces the possible structural complexity of tasks, it is often taken for optimization problems (e.g. [77]).

Definition 3.2 (Layered Multi-Step Tasks). A layered multi-step task is a task where sequen- tial steps can be identified such that the task can be divided and a step has one exact prede- cessor and/or successor. Given input space X, we assume that i = 1,...,H with H ∈ N+ steps have to be performed to solve the task, where a step is considered to be a task it- self, i.e. a step is a 3-tuple Stepi = (Xi × Bˆi → Y i) with X1 = X,Y H = Y, and Xi = Y i−1 for i > 1. The solution of a layered multi-step task for an input instance x is thus de- fined as Solution(x,B) = SolutionH (SolutionH−1(...(Solution1(x,Bˆ1)...),BˆH−1),BˆH ), where Solutioni : Xi × Bˆi → Y i solves the i-th step.

Non-layered multi-step tasks can be defined in a similar fashion, where now multiple pre- decessors as well as successors are possible.

Definition 3.3 (Non-Layered Multi-Step Tasks). A non-layered multi-step task is a task where appropriate steps can be identified such that the task can be divided, but a step might have more than one predecessor and/or successor. Given input space X, we again assume i = 1,...,H steps to be performed to solve the task, where a step is defined as before with X1 = X,Y H = Y, and Xi = S Y j if |PRE(Stepi)| > 0 and Xi = X otherwise, Step j∈PRE(Step j) where PRE(Stepi) := {Step j|Y j ∩ Xi 6= /0}, i.e. PRE returns all directly preceding steps of a step. The solution of the task for an input instance x is thus defined as Solution(x,B) = S Stepi(xi) where LAST(Task) := {Stepi|∀Stepi ∈ Task : Stepi(xi) ∈ yH } and xi ∈ Xi Stepi∈LAST(Task) is recursively estimated. LAST returns all final steps of the multi-step tasks, i.e. all steps which are part of the solution.

A crucial property of multi-step tasks is the number of steps H. We distinguish between two conditions for multi-step tasks: (i) It can be solved in exactly H or within [1,H+] steps where H+ ∈ N+ is a predefined upper bound 1 and (ii) neither H or H+ can be set in advance with full confidence. Given that H or upper bound H+ can be confidently estimated, we assume that each step of a task solution is of different nature such that a unique, finite solution can be found. In other words, all possible step sequences to solve the task are countable and finish in at most N+ steps. Hence, if a step has to be repeated on the resulting output space, one cannot ensure

1H+ could, for example, be set according to the longest possible sequence of steps to solve the task.

24 ak,fu ae r osbe ahcs a ifrn elwrdapplications. real-world different has case Each possible. are cases four tasks, 3.1. 3.1 Example Example in case each for tasks example discuss we tasks, step instances. input for repeated H and H 3.5 Definition tasks step h ubro tp rteuprbudo tp H steps of bound upper the or i.e. different, H steps of instance. number input different the a for repeated 3.4 be still Definition could step a consequence, a As set. be can or mutually constraining only bound where upper steps, an of number finite a set. be in can generated bounds upper is probabilistic solution correct a that oilsrt h ifrne nfiie n nnt swl slyrdadnnlyrdmulti- non-layered and layered as well as infinite and finite- in differences the illustrate To fcniin(i od,w antset cannot we holds, (ii) condition If efis define first We • • • o aee,btrqie oecmlxsrcueo steps. is of tasks of structure multi-step series complex overall more a the a or consequence, requires a algorithm but As layered, expert not sequence. an either finite by a can derived in example, experts, algorithms for expert domain ma- images, by to segmented segmented created respect as manually The with such be information. definitions inputs demographic of patient-specific or set additional complex images and behavior, and physical large or a requires terials heart, the as such Tasks: Multi-Step Non-Layered Finite differ usually images) steps. processed is the all sufficiently step for (i.e. was a instances image input the such respective when of the advance) Here, example (in prominent processed. unknown One is it times. where filters, multiple image solved applying be to have steps where Tasks: Multi-Step problems. different Layered tackle a Infinite they for once as solved different, be conceptually to are has steps steps all NERD and the of instance Each task is example 2.1. such Section One in introduced pipeline. as processing NERD, a in steps different conceptually multiple, with Tasks: Multi-Step Layered Finite + different . r nnw,a hycno ecndnl siae,adsesmgthv obe to have might steps and estimated, confidently be cannot they as unknown, are ∀ Mlise ak ihdfeetproperties) different with tasks (Multi-step IfiieMliSe Tasks) Multi-Step (Infinite i Fnt ut-tpTasks) Multi-Step (Finite , N ihi with j + ihrgrst h epcieiptisacs sohrien pe bounds upper no otherwise as instances, input respective the to regards with nt ut-tptasks multi-step finite o h ubro tp fats.W diinlyrqiealsest be to steps all require additionally We task. a of steps of number the for 6= Step j i 6= Step H hr odto i od ..w tlatknow least at we i.e. – holds (i) condition where , . . j ueoscalne nov osletasks solve to involve challenges numerous NLP, In ∨ and/or nifiiemlise aki ut-tpts where task multi-step a is task multi-step infinite An nt ut-tpts samlise akwhere task multi-step a is task multi-step finite A nIaePoesn,oeotnhst ov tasks solve to has often one Processing, Image In x i 6= iuaino,freape ua organ human a example, for of, simulation A x H j . + ofietyadda with deal and confidently + skonadfie,adalsesare steps all and fixed, and known is . ie u ento fmulti-step of definition our Given . ut-tpTasks Multi-Step 3.1 nnt multi- infinite 25

3 Chapter 3 Preliminaries

• Infinite Non-Layered Multi-Step Tasks: We can directly extend the finite non-layered multi-step task example for simulation to the infinite case. For the latter, we might de- fine an infinite layered multi-step task for image segmentation, where a more elaborate sequence of steps enables to apply respective filters multiple times to also normalize the image. Since now a subset of the former finite multi-step task is infinite, the complete overall multi-step task becomes infinite.

Note that task properties (i.e. finite, infinite, layered or non-layered) are often dependent on design choices for steps and/or their solution functions. An important observation is the following: Our task definitions are recursive, as a task is dependent on its direct predecessors. In case of the non-layered property, there are thus no ex- haustive assumptions on the absolute order of solving steps (e.g. they might occur in parallel). When trying to determine the absolute order (or plan) of solving the task, there is a critical distinction to make. In workflow systems, one would derive plans where parallel executions are specifically modeled or retrieved (e.g. via a graph). In a decision-theoretic perception, where actions need to be actively taken to reach a goal state (see MDPs in Section 3.3.2), a single action needs to be taken per time step. Depending on how the environment is modeled, one can represent multiple step combinations as single actions, but the underlying assumption is different. A combined task action (which would simulate parallel steps in a workflow) is only chosen if there is positive feedback from the environment, i.e. if there is sufficient added value (e.g. due to time savings). On a final note, the two perceptions are complementary (thus not conflicting), as one could derive optimized workflow plans after having optimized the decision-theoretic value of available actions. To this end, central properties of a number of solutions for task instances are optimality and near-optimality. Optimality with respect to a task solution function and a set of input instances of the task entails that the respective solution function must not make any mistakes. Near-optimality, on the other hand, assumes a predefined threshold which defines the number of allowed mistakes for a solution function. Note that we define optimality and near-optimality for the solution of the complete multi-step task. The concepts could, in principle, be similarly defined for step solution functions, but would overly restrict task solutions, as only the overall hypothesis has to be correct. Definition 3.6 (Optimality and Near-Optimality for Task Solutions). Given a task (X,B,Y) ∗ ∗ with set of input instances x1,...,xn ∈ X, let y1,...,yn ∈ Y be the correct solutions with respect ∗ to the task. A solution function Solution is optimal with respect to x1,...,xn if it outputs ∗ ∗ ∗ ∗ y1,...,yn as set of solutions, i.e. ∀xi with i ∈ 1,...,n : Solution (xi,B) = yi . A solution function Solution is near-optimal with respect to x1,...,xn if it outputs the correct solution for + ≥ k ≤ n input instances (with k ∈ N ) and n − k < ε with ε ∈ N 0 a predefined error threshold, ∗ i.e. |{x j|x j ∈ x1 ...xn ∧ Solution(x j,B) = y j }| = k. The property is often used for methods in supervised learning and decision theory, and thus similarly holds for subsequent concepts. We do not attempt to prove any values or boundaries

26 hr h osi eoi x if x zero samples is for loss the f x where i.e. of parameters, predictions model of and series variables a random of products on dependent is parameters x model features) learn as to to has (referred variables random Given task. current y f model) samples as given to and (referred otherwise or set graph, as resented 3.7 Definition instance input an to values numerical lows. or classes as of to as assignment (referred to the (referred as sets such training outcomes, diverse assumes one learning, supervised In Learning Machine Supervised 3.2 con- budget expert to due (e.g. information concepts. ap- missing and as define concepts specifically well We selected basic straints). as with introduced decisions having deal sequential After we with learning, next. supervised with of deal proaches we which instances, input making by learning tasks, complete the solve to order possible. in as exchangeable mistakes hypotheses Given few available as the to instance. combine needs to input thus how one each or step, Expert for given tasks. correct a multi-step for for always steps experts not solve to however, able are, are which on hypotheses algorithms – i.e. 1 Chapter empirically, in upon mentioned reasoned and shown be to have instances. input others) available solution to of respect improvements relative with (or functions functions solution of performances absolute that assume for frlann ersinmdl) h eainhpamong relationship non-linear). The (e.g. complex to models). linear) regression (e.g. simple learning from range for might 3.2.4 parameters model Section (see them explain we where number. real a i srfre oa eedn aibeo ae n enstecretslto o x for solution correct the defines and label or variable dependent as to referred is erigaoteprscnb prahdb sn available using by approached be can experts about Learning available assuming tasks multi-step solving with deals thesis The A ε function eg hwn htaslto ucinawy ece eti orcns) erather We correctness). certain a reaches always function solution a that showing (e.g. sacnrlbac fmtosfruigtann est eeaiet oe,unseen novel, to generalize to sets training using for methods of branch central a is observe (or aapoint data GnrlSprie erigParadigm) Learning Supervised (General model auso admvralsadhv to have and variables random of values hscnit of consists thus ) : i .Tegnrlsprie erigprdg a edfie sfol- as defined be can paradigm learning supervised general The ). X = → y i osfunction loss A . φ .Hr,x Here, Y. experts ( j ) o j for atrhvn nrdcdtenecessary the introduced having after 3.3.1 Section in = 1 admvariables random ,..., i 1 learn srfre oa aapito nu ntne and instance, input or point data as to referred is ,..., x L n hnt hoewihpooe hypothesis proposed which choose to when with m unie h orcns ihrsett y to respect with correctness the quantifies : φ × ( x X i . φ , decision-theoretic × y ( ie data Given . uevsdMcieLearning Machine Supervised 3.2 i j n respective and ) ) Y learn ∈ ∈ → R i ( Ξ 1 ) ews olanafunction a learn to wish we , R aee data labeled hr h rdcin f prediction, the where ,..., hi oe aaeesto parameters model their riigsets training experts aspeitosf predictions maps Ξ x hc ih erep- be might which , i ( m ) h atrae–as – are latter The . oe parameters model i ( ihx with j ) lmnst deal to elements φ for ) ( j . ) h osof loss The . i ( Supervised j ) predicting ∈ i R o the for ( x one , ( i ) x 27 i to ) i , , ,

3 Chapter 3 Preliminaries

To this end, models might also have hyperparameters, which are essentially different from model parameters, and might influence both how the model parameters are learned and what the model predicts for a data point.

Definition 3.8 (Hyperparameter). A hyperparameter defines design choices for a model, e.g. value thresholds for predictions or categories given multiple available learning or prediction methods, and can thus be a real number, String or otherwise. A hyperparameter, in general, has to be empirically assigned based on the current task and available input instances of Ξ.

As supervised learning is a very broad research field, we only highlight and introduce se- lected concepts which are relevant to this thesis. We first discuss differences in assumptions with regards to available data and then deal with the learning process itself – namely how learners (i.e. the instance controlling the respective model) receive training data and what this implies for the resulting models.

3.2.1 Model-specific Assumptions

Given data Ξ and a model f , there are two opposing assumptions. The most common assump- tion is that the random variables of f based on Ξ are independent and identically distributed (i.i.d.).

Definition 3.9 (Independent and identically distributed random variables). The i.i.d. assump- tion stipulates that random variables used for a model and instantiated by data Ξ are (i) independent, i.e. do not have any causal effects on each other, and (ii) identically distributed, i.e. share the same distribution with the same model parameters (e.g. same mean value for a normal distribution).

While the assumption is strong, it enables to (theoretically-) study the generalizability of learned models to unseen data in terms of performance bounds (i.e. general statements on near-optimality). However, the assumption might be inadequate if data distributions are changing over time. To this end, adversarial learning takes the opposite stance and does not take any assumptions on random variables.

Definition 3.10 (Adversarial Learning). The adversarial assumption stipulates that no as- sumption can be made about the generating distribution of random variables.

Note that this has essential consequences on the ability to learn from data. We discuss a central adversarial learning approach in Section 3.3.1 and now deal with possible learning protocols, which are closely connected to model-specific assumptions and determine how many training samples we are allowed to perceive before learning or updating a model.

28 apeoefis ae rdcinadte eevstecrepniglbl.Hr,both Here, 3.12 label). Definition corresponding the exist. receives approaches then adversarial as and well prediction as a i.i.d. makes first one sample f of re omnmz h oso f of loss the minimize to order samples adversarial as 3.13 well as Definition i.i.d. both mini-batches Again, in samples model. receive a exploiting exist. gradually of learning, approaches to performance online or predictive model to the the Compared improve update can to once. samples at of mini-batches number predefined the a for wait either to batch f of loss the minimize to f model ( 3.11 Definition set. training complete the observe to allowed points. and data labeled knowledge of performance background ordering input, timely the respective merely with is problem former the distinct and a spaces with output deals latter in the occur – points episode at data label that with assume point we property, this 1 highlight To earlier. occur ( order sequential as in goal. taken fast given be executing as a to (e.g. have reach actions interaction happen to training where by hypotheses), should generated (ii) their be observe updates to once, number to have where models at might large learned data settings, them a training use (iii) streaming of lastly to in and consist possible, possibility available might the be sets eliminate might data might samples (i) which samples reasons: training several of for occur differences These the in while model, – protocols learning learning two between distinguish can one general, In Protocols Learning 3.2.2 x x ,..., 1 i For ial,an Finally, the In data of set the that assume we settings, the between comparison ease To , , y ( y i x 1 ) ..asbe falaalbesmlsi sda once. at used is samples available all of subset a i.e. , ) Z ) nielearning online . sodrdb h ieyocrec ftelte,ie aapit ihsalrindexes smaller with points data i.e. latter, the of occurrence timely the by ordered is ,..., with ( etn,w suealrlvn riigdt onst eaalbebfr eriga learning before available be to points data training relevant all assume we setting, ac erigsetting learning batch x z − ( Z x asmto od,a n is one as holds, assumption i.i.d. the if points), data unseen for loss low (i.e. Z BATCH z nemdaescenario intermediate , y ∈ BthLearning) (Batch Oln Learning) (Online Mn-ac Learning) (Mini-batch z ) N , er oe sn l aee aapit toc omnmz h loss the minimize to once at points data labeled all using f model a learn , y + nielearning online z − eueeioe ssbcitfrdt ons i.e. points, data for subscript as episodes use We . n rdal pae oesfridvda aapit ie o each for (i.e. points data individual for models updates gradually one , Z BATCH z oeta neioei ifrn rmase namlise task multi-step a in step a from different is episode an that Note . ,..., ( x ti eeal airt er oeswt high with models learn to easier generally is it , ) x ( . . z x , . ie aapit n aesosre ni psd ,i.e. z, episode until observed labels and points data Given z y ) etn,w eev h aesgaulytruhu time. throughout gradually labels the receive we setting, z ie psd ,ol use only z, episode Given . xss hr oe sgaulyudtdb a by updated gradually is model a where exists, ) . fz if ie psd n ac ieZ size batch and z episode Given mod Z BATCH = . uevsdMcieLearning Machine Supervised 3.2 0 iibthlearning Mini-batch olano paeamdlfin f model a update or learn to batch ( x z , y z and ) olano paea update or learn to online Ξ ( BATCH x z ihdt points data with , y z nthe In . psdsz episodes ) hsentails thus ∈ stedata the is predictive N + batch mini- use , 29 ∈

3 Chapter 3 Preliminaries

Similar to batch and online learning protocols, there are different prediction settings, which we now discuss.

3.2.3 Prediction Settings

The standard prediction setting in supervised learning is that a single input instance xz is given to the learner in every episode z (referred to as single label prediction). However, it might be beneficial to exploit the knowledge of seeing multiple input instances at once. This setting is referred to as pool-based prediction. The latter is especially useful when relations among data points are available (see Link Prediction in Section 3.2.4). We first deal with the single label prediction case, where a classic example is linear regres- sion, which we deal with in Section 3.2.4. Definition 3.14 (Single label prediction). In the single label prediction case, the learner re- ceives data point xz at episode z and makes prediction f (xz). For pool-based prediction, we receive a predefined number of input instances at once to predict outcomes their outcomes. + Definition 3.15 (Pool-based prediction). Given pool interval number ZPOOL ∈ N and episode z, the learner receives data points xz,...,xZPOOL , makes predictions f (xz),..., f (xZPOOL ) at once. An important application of the pool-based prediction setting is collective inference in Graphical Models, where links among graph entities are exploited to generate consistent hy- potheses for multiple properties (see Definition 3.22). We now revisit selected instantiations of mentioned learning scenarios, namely Regression and Link Prediction.

3.2.4 Selected Learning Scenarios We first deal with Regression and assume data points x ∈ X to be represented as vectors, which often consist of manually engineered features. We then turn to Link Prediction in graphs, where we also deal with relational learning.

Regression We first define the general regression setting, where an input instance represented as vector has to be mapped to a numerical value. Definition 3.16 (Regression, Squared loss). Regression entails to map an input instance xi to a real valued number, i.e. to learn a function f : X → R. In the multivariate case, we assume each data point to be represented as feature vector of length NFEAT, i.e. x =< x(1),...,x(NFEAT) >, where x(i) are random variables. A standard for re- T gression is the squared loss LSQUARED(φ,x,y) = kφ x − yk2

30 inlydpneto h ieo riigdt es h eutn opeiyle within lies complexity resulting The sets. data training of size O the O within on thus O dependent is within tionally GD for SGD notation for O iterations. or and Big epochs in of number complexity the computational and representation, The feature the of dimensionality the on but 3.1. Analysis Complexity L samples given When 3.18 Definition specifically more learning, online and edt aeprildrvtvsfrec aibei h oe,i.e. model, the in variable each for derivatives partial take to need φ 3.17 Definition variables. random able ∗ ( ( and i.i.d. be to variables random available assume we updates, gradient the calculate To efcso tnadlna ersinaddiscuss and regression linear standard on focus We a for search we case, linear the In φ | = N ,abtho samples of batch a (BGD), descent gradient batch In • of number a sampling by GD of concept the extends (SGD) descent gradient Stochastic • eest h nieudt,weetemdli individually is model the where update, online the to refers (GD) descent Gradient • , x EPOCH w iteration N points data labeled psd sue o igeudt,i.e. update, single a for used is Z episode pae iheeysml ae nalann aehyperparameter rate learning a on based sample every with updated igeudtsmgtas erpae nepochs. in repeated N be as also might defined updates and single hyperparameter GD, in a updates is batch epochs repeats of usually one converge, to BGD For i , ( y 1 ) i ) ,..., . || N h datg sta o l aapit aet eue o learning. for used be to have points data all not that is advantage The . w FEAT Lna Regression) (Linear ( Gain ecn,Sohsi rdetDset ac rdetDescent) Gradient Batch Descent, Gradient Stochastic Descent, (Gradient N FEAT f || ( Ξ x ) ( = ) | x ) uhthat such . i ( , | y N φ i ) T FEAT IT aentdpneto h ieo riigdt sets, data training of size the on dependent not are SGD and GD ese omnmz h rdeto nabtayls function loss arbitrary an of gradient the minimize to seek we , x ∈ = || N w φ N + ( = IT 1 . ) φ o niiuludts hr n paei eerdt as to referred is update one where updates, individual for | iercombination linear x ierrgeso nal ofidmdlprmtrvector parameter model find to entails regression Linear ) φ ( rdetdescent gradient = udts nteohrhn,aeaddi- are hand, other the on updates, GD Batch . 1 − ) φ + η − w BGD ( η 2 ) GD x ∇ ( φ 2 ∇ + ) Z 1 φ L i ∑ rdetsolutions gradient = Z ... ( 1 . φ EPOCH L fmdlprmtr ae navail- on based parameters model of , + ( x φ . uevsdMcieLearning Machine Supervised 3.2 ( i w , x , y 1 ( x i n , i ) ∈ ) , y x y 1 ( N i ) N ) ∀ ,..., FEAT epochs + φ oeta o standard for that Note . j ) ∈ ( = o ac- mini-batch batch-, for x φ Z hr h number the where , y , ( . y | η N Z GD ) FEAT olce until collected ∈ || R N ≥ EPOCH 0 i.e. , (3.2) (3.1) (3.3) 31 | ) .

3 Chapter 3 Preliminaries

Link prediction for graphs is related to regression, as one wants to estimate the probability of an edge to be part of a graph. It can be approached by Relational Learning, for which we now describe a simple approach.

Link Prediction Link prediction is an important problem setting, where one aims to predict if two nodes in a graph are connected via a specific relation. As graphs on the Web often comprise multiple relations, we first define multi-relational graphs (MRG) as basic representation for Relational Learning. Definition 3.19 (Multi-Relational Graph). Let G = (V,R) be a MRG with set of nodes V and family of typed edge sets R of length m, i.e. R = {R0,...,Rm ⊆ (V ×V)} where r = 1,...,m denote the unique relations in the graph. MRGs frequently occur on the Web (e.g. in social networks or the Linked Open Data Cloud 2) and are often used for analysis. We can now define the linked prediction problem for MRGs.

Definition 3.20. Given MRGG = (V,R), the goal is to predict a fixed relation r ∈ Ri with LP Ri ∈ R for pairs of nodes (vi,v j) ∈V ×V with i 6= j, i.e. learn function f :V ×V ×R → [0,1]. A straightforward Relational Learning approach to learn f LP is using supervised learning approaches for vector-based representation (i.e. by reducing the graph structure).

Definition 3.21 (Relational Classifier for Link Prediction). For nodes (vi,v j), a transforma- tion function f TRANS : V ×V → X has to be developed, where X is a fixed-length feature space. Based on the derived vector-based representation X, any adequate approach can be used to learn regression functions for Linked Prediction, i.e. f LP : X → [0,1]. Such approaches, however, lower the graph structure and thereby lose expressivity, as the relation among nodes in the graph are not fully exploited. The class of Lifted Graphical Mod- els (LGMs) [68] approaches this shortcoming by extending Probabilistic Graphical Models (PGMs) [71] with a relational structure, yielding a Statistical Relational Learning (SRL) ap- proach. LGMs thus exploit declarative representations of knowledge, where one manually models relations among nodes before any learning takes place. Pool-based predictions then enable to jointly predict new relations in the graph for a number of nodes, referred to as col- lective inference. We use a LGM for collective inference to solve the learning problem in Chapter5. Definition 3.22 (Collective Inference). Given a pool-based prediction setting and a MRG G = (V,R), collective inference exploits the graph structure of G to generate joint predictions N+ for a pool of nodes v1,...,vZPOOL ∈ V of predefined size ZPOOL ∈ .

2http://lod-cloud.net/ (accessed on 05/01/2018)

32 h rcso ffgiven f of precision The noKfolds K into into them 3.23 divide Definition to order samples). in of large subsets sufficiently disjoint are (i.e. sets training perform available (ii) as or evaluations, size predefined a of samples tp(..sm nu n uptsae)t mrv h vrl performance. overall the improve to spaces) output and input same (i.e. evaluation step the task in out from pointed mappings be The will and predictions. dependent wrong task is its chapters. negatives of or sets positives the to instances are negatives false and positives given f of recall The parameters model of sets K learning by folds K models. approached on focus be can models learned Evaluating Models Learned Evaluating 3.2.5 different introduce models. now evaluating We for performance. predictive its quantify to order h 1maueo given f of F1-measure The ( 0 3.24 Definition are problem learning samples. available all on f of performance the validate to times parameters x − , salse erc o the for metrics established 9 , Chapter in evaluations our in used tasks multi-step the For efial ics rnho prahst obn vial ere oesfrtesame the for models learned available combine to approaches of branch a discuss finally We false and model, a of predictions correct are thus negatives true as well as positives True generate To to important is it scenario, learning for learned been has model a that Given y 1 FlengtvsFNs: negatives False • Tu oiie TPs: positives True • Tu eaie TNs: negatives True • FlepstvsFPs: positives False • ) ∈ osfunction loss Ξ empirical n ere rdcinf prediction learner and φ Ξ k k o ovldt h rdcin ntelf u od h rcs srpae K repeated is process The fold. out left the on predictions the validate to f for representative ihk with Peiin eal F1-measure) Recall, (Precision, (Cross-Validation) L vlain n s riigst oqatf the quantify to sets training use and evaluations 0-1 Ξ precision ∈ ( sdfie sRecall as defined is φ Ξ 1 , ,..., { x sdfie sPrecision as defined is { { Ξ x , { x x y | x x | sdfie sF1 as defined is | = ) eut o miia vlain,oecnete (i) either can one evaluations, empirical for results x x , | fsize of K ∈ x recall ∈ ∈ ∈ Ξ 1 Ξ Ξ ( . Ξ ∧ { x ∧ ∧ x ) ie riigset training Given ∧ 6= edfietefloig sets: followings the define we , f and y f f ( } f ( ( x ihpsil ucmsY outcomes possible with ) ( x x | = ) K Ξ x ) = ) F1-measure | ) 6= = oeetal etagvnmdlfo aho the of each on f model given a test eventually to cross-validation 6= y y y = | ∧ y TPs theoretically ∧ ∧ . ∧ y φ 2 | y y TPs | k o h iaycasfiainpolm(with problem classification binary the For = Precision = y + rcso Recall Precision n ssK uses One . = = | = FNs True | | TPs False False True . | TPs | Ξ + + } rs-aiainetist split to entails cross-validation , TNs Recall | . uevsdMcieLearning Machine Supervised 3.2 } } } euecosvldto o our for cross-validation use We . or empirically − 1 rdcieperformance predictive = od o erigmodel learning for folds { True efrac metrics performance nti ok we work, this In . , False evaluate } bootstrap samples , folds tin it 33 of Ξ

3 Chapter 3 Preliminaries

3.2.6 Learning Combination Functions over Models Given that a number of models for the same step exist, a combination of all models might yield better results than the best single model in hindsight. The field of Ensemble Learning studies how to learn combination functions over exchangeable models. Here, one usually perceives individual models as black-box models or black-box functions, as the respective learning approach is not taking into account. To learn combination functions, one usually assumes individual models to be mutually independent, such that resulting feature spaces are i.i.d..

Definition 3.25 (Ensemble Learning). Given N black-box models f1,..., fN and data set Ξ for a step, where fi : X → Y predicts a hypothesis for a step, the goal of ensemble learning is to learn a joint function fˆ : f1 × ... × fN × X → Y.

Prominent approaches comprise boosting [17, 48] where additive joint functions are learned, mixture of experts [58] where a softmax layer (i.e. models are weighted exponen- tially to their estimated performance) switches among outputs of individual models or stack- ing [143] where the joint function is learned with generated data sets based on cross-validated black-box models. With respect to the contributions of this thesis, we focus on stacking.

Definition 3.26 (Stacking / Stacked Generalization). Given N models f1,..., fN and data set Ξ, as in the Ensemble Learning setting, the high-level protocol of stacking follows:

• For f1,..., fN, perform K-fold cross-validation on Ξ.

• Use the resulting model-dependent data set {(( f1(x),..., fE (x)),y)|∀(x,y) ∈ Ξ} to learn joint function fˆ .

Based on prior introduced topics in supervised learning, we now turn towards frameworks for decision-making.

3.3 Decision-Making

Other than in the standard supervised learning setting, we now deal with learning settings where (i) feedback is limited and only available for actively chosen hypotheses and/or (ii) multiple decisions have to be taken in a sequence for a single task instance. Taking standard supervised learning as baseline, an agent (i.e. the decision-maker) has to take a single action (i.e. predict a hypothesis) for a state (i.e. a data point or input instance) and gets full feedback (i.e. receives the label). We, from now on, use the notion of states and actions to refer to data points and predictions. Using this terminology eases the integration of prominent works in supervised learning, single-step decision-making and multi-step decision-making.

34 elas well h ieaue omlzto ftegnrlfaeoki oe ihrsett hsthesis. tuple this to respect 3.28 with in novel Definition is studied framework extensively general are the setting of advice formalization expert a literature, general the the within settings presented quently possible account and S space state a Given rule. e simple function or a heuristic is s expert a an be A, example, space action for an also, might it – niques f function box We latter. the 3.27 Definition among learning. choose supervised to from has concepts which on agent, based an experts define to first hypotheses) their (i.e. actions suggest advice expert with Prediction Advice Expert with Prediction 3.3.1 general agents. with cooperating deal independent, finally We exists. advice step expert multi-step in work prior to assume ∈ enwpeettegnrlstigof setting general the present now We eeatfaeok a ectgrzdinto categorized be can frameworks Relevant multi-step S. LtC Let • w Let e • function a is expert an where experts, of set the be E Let • actions. of set the be A Let • Let • states. of set the be S Let • eiinmkn,weetecnrlo the of control the where decision-making, ( loe oeeueprstate. per execute to allowed state. a for performance predictive its state. a for adversarial). vs. i.i.d. (i.e. distribution data the of assumptions S ulfeedback full , xetadvice expert E Γ , A EXP e ∈ , : eiinmkn,weew elwt eea eiinmkn set,a no as aspects, decision-making general with deal we where decision-making, R S i ∆ , ∈ (Expert) ,wihi o osrie osprie erigtech- learning supervised to constrained not is which ), 3.25 Definition (see ( → C GnrlEpr dieSetting) Advice Expert (General S ugtconstraints budget EXP N ) R + etesaedsrbto,wihi togydpneto h underlying the on dependent strongly is which distribution, state the be and , etewih ucinfra xete expert an for function weight the be C eteepr ugt hc eemnshwmn xet h gn is agent the experts many how determines which budget, expert the be ie eea vial oest eeaehptee)adte turn then and hypotheses) generate to models available several (i.e. FEEDBACK . ata feedback partial nepr o tpi ut-tpts steeuvln fablack- a of equivalent the is task multi-step a in step a for expert An sapoietpolmstig hr multiple where setting, problem prominent a is [27] ) . ihrgrst edak oeta,wiealsubse- all while that, Note feedback. to regards with ebgnb discussing by begin We . rdcinwt xetadvice expert with prediction : S → eiinprocess decision ,wihsget nato a action an suggests which A, one-step . h eea xetavc etn sa is setting advice expert general The and ∈ hc eun netmt of estimate an returns which E multi-step : sdsrbtdaogseveral among distributed is S single-step → ,pooiga action an proposing A, . Decision-Making 3.3 hr etk into take we where , eiinmkn as decision-making utaetmulti- Multiagent etnswhich settings ∈ o state a for A experts 35 6 -

3 Chapter 3 Preliminaries

• Let CFEEDBACK : S × A → {E} be a function which return the set of experts the agent receives feedback for.

• Let R : S × A → R be the reward function for an action given a state, which quantifies its correctness (i.e. similar to the inverse of a loss function in supervised learning).

• Let π : S → A be the policy which depicts the eventual action the agent chooses.

The protocol of the general expert advice setting proceeds in episodes z ∈ 1,...,Z with Z as defined before, where in each episode a single state sz is revealed to the agent.

• The agent perceives sz.

• The agent gets estimates we for all available experts.

• Given the expert budget CEXP, the agent executes a number of experts

e1(sz),...,eCEXP (sz) and perceives their suggested actions.

• The agent chooses an action based on its expert weight estimates.

• The agent is revealed the reward of the chosen action, R(s,a), and, based on the current feedback setting, of expert set CFEEDBACK(sz,az).

The goal is to maximize the expected cumulative reward RCUM with respect to the chosen actions.

Definition 3.29 (Cumulative Reward for Prediction with Expert Advice).

CUM R = E[R(s,a)] (3.4)

We now describe different expert advice settings, where budget CEXP and feedback con- straint function CFEEDBACK are varied.

Expert Advice with Full Information We begin by discussing the full information setting, where the agent is allowed to query all available experts and receives feedback about all available actions.

Definition 3.30 (Prediction with Expert Advice). Given the general expert advice setting, we set CEXP = |E| and CFEEDBACK(s,a) = |E|.

A central algorithm which solves the setting is Exponential Weights/Hedge (EWH), which is an adversarial learning approach. The algorithm sets the baseline for numerous extensions (e.g. [3,6, 15]). We now introduce EWH in its simplest form.

36 C lxt ntrso xet n o psds saleprsaeeeue o aheioe the episode, each for executed are O experts is all notation As O episodes. Big not in and complexity experts of terms in plexity 3.2. Analysis Complexity probability. highest the with action w 3.31 Definition ento 3.32 size Definition rewards of combinations receives expert only all where agent [3], the algorithm setting, EWH the this of In extension an by state. approached a for execute experts of to for number allowed the constraining is is agent advice expert the with prediction of specialization natural A Constraints with Advice Expert with deal not do we which performance, predictive thesis. the this for in bounds lower provide to aims thus 1 rbes(hr ata aesrsl rmcntandifrain.I h otx fexpert of context the In information). constrained from result labels partial (where problems the exploration-exploitation is advice expert the with for reward a receives only state. the for executed been C set we setting, [ apdto mapped o hi cin.Tedcso o tpi hnbsdo rbblt itiuinP distribution probability on based then is step a for decision The actions. their for 0 . EXP e 0 , enwbifl ics ifrn osrie aitoso rdcinwt xetadvice. expert with prediction of variations constrained different discuss briefly now We with deals often one algorithms, decision-theoretic for that Note policy greedy the distribution, probability the on Based nte pcaiainfrepr diei eae to related is advice expert for specialization Another h loih hsudtswihso xet xoetal otesae eadobserved reward scaled the to exponentially experts of weights updates thus algorithm The asmsasae eadfnto R function reward scaled a assumes EWH , 1 h paerl ste endby: defined then is rule update The . xctdexperts executed ] o l vial actions: available all for r weighted. are [ − 1 , EXP 1 Peito ihBdee xetAdvice) Expert Budgeted with (Prediction EpnnilWihsHdealgorithm) Weights/Hedge (Exponential ] erigrt hyperparameter rate learning a , | ≤ stersda xethptee eanukon h rbe a be can problem The unknown. remain hypotheses expert residual the as , w E e | ( rbe.Telte scnrlfor central is latter The problem. s n C and EXP4.P z + i nagrtmfroln erig emaueiscom- its measure we learning, online for algorithm an is EWH As 1 = ) P action ( FEEDBACK s z ( w , | a E wt epc othe to respect with EWH extends which [15], algorithm e z ( | = ) s ) tcoss rmnn ouint otxulbandits contextual to solution prominent A chooses. it z . ) exp ( ∑ s , e a ∈ = ) E SCALED R ∑ 1 SCALED e { ∈ { e ( E e s z |∃ w )= η : a e S ( a EWH ( s z otxulbandits contextual = s × } z z , w . ) e e A single- e ( ( oudt h egto nexpert an of weight the update To ∈ . ( π s s → s ) z ( ie h eea xetadvice expert general the Given N z )) } s ) [ z + eun l xet hc have which experts all returns − = ) η n nta egtw weight initial and 1 and EWH , 1 max ] a  apecomplexity sample ut-tpprillabel partial multi-step z hr ead nRare R in rewards where . Decision-Making 3.3 P ( ,weea agent an where [6], s z , a z ) hoe the chooses : S e × ( s (3.6) (3.5) 1 A and = ) 37 →

3 Chapter 3 Preliminaries advice, agents need to successfully balance exploiting experts with high weights and exploring experts with low or unknown weights.

Definition 3.33 (Prediction with Expert Advice for Partial Feedback). Given the general ex- pert advice setting, we set CEXP = |E| and CFEEDBACK(s,a) = {e|e(s) = π(s)}.

We now deal with multi-step decision-making problems, where sequential decisions have to be taken in a single episode.

3.3.2 Markov Decision Processes If an agent has to choose several sequential actions for a single episode, the problem can be formalized as MDP[109], where the goal is to find an optimal policy for acting in the world. We first define the general MDP setting.

Definition 3.34 (General Markov Decision Processes). A General MDP is a 6-tuple (S,A,γ,Tr,R,H).

• Let S be the set of world states and A the set of actions, as defined for expert advice.

• Let Γ ∈ ∆(S) be the start state distribution, as defined for expert advice.

• Letγ ∈ [0,1] be the discount factor hyperparameter.

• Let Tr : S × A → S be the transition function, which depicts the resulting state of the agent after taking an action.

• Let R : S×A → R be the reward function, which returns the local feedback for the taken action, as defined for expert advice.

• Let H be the horizon, which defines the number of actions to be taken in an episode and is defined differently for respective instantiations of the General MDP setting.

• Let π : S → A be the policy, which depicts the action the agent chooses, as in the expert advice case.

The protocol of a general MDP proceeds in episodes z ∈ 1,...,Z with Z as defined before, where in each episode a single start state sz is revealed to the agent.

• For each time step h ∈ 1,...,H, the agent acts based on the state of the current episode h 1 and time step sz , where sz = sz. h – The agent perceives the current world state sz . h – The agent chooses action az . h+1 – Based on the transition function Tr, the agent moves to the next state sz .

38 eado h oiyadtevle fftr tt aus ae ntetasto probabilities. transition the on where based values, a, state action future given of Hence, values the and policy the of reward 3.38 Definition of principles Q Q-function or pairs. function state-action value of state-action the while state, a 3.37 Definition strategy selection by mated MDPs yeprmtri o required. not is hyperparameter H and 3.36 Definition γ 3.35 Definition [ = define [11] equations Bellman The The follows: as are MDPs of properties essential The ,w o en w ae,namely cases, two define now we MDPs, General of definition the on Based • • • • 0 , aeotmlactions. optimal take states: Markovian ulobservability: functions Full transition policies. step-dependent and time reward- learn define directly can or and steps actions, time over of number maximal the about is it policies: non-stationary versus Stationary state. a Rewards: = and lblquality global 1 ] N – au functions value n H and non-stationary + MDPs finite-horizon h gn eevsreward receives agent The nti ae h ubro eiin scntandb n h icutfactor discount the and H by constrained is decisions of number the case, this In . yai programming dynamic V lefunctions) (Value MDP) (Finite-Horizon MDP) (Infinite-Horizon = BlmnEquations) (Bellman ead r ueia auswihqatf the quantify which values numerical are Rewards hc ist aac xlrto n exploitation. and exploration balance to aims which , V ∞ nti ae naetmgtb aiga nnt muto actions. of amount infinite an taking be might agent an case, this In . ( fsae and states of s h = ) ae ntelte,oecndtrieteplc ae nan on based policy the determine can one latter, the on Based . ,a n knows one as MDP s, finite-horizon for case the be might This . ol eed ntecretstate. current the on depends only MDP a in action an Taking easm htw a bev l relevant all observe can we that assume We max a h π ∈ A ( . s R = ) . ( s tt-cinpairs state-action h tt au ucinV function value state The . h , a: . optimal a R . h h au fasaei e codn otecurrent the to according set is state a of value The ( + ) with MDP General a is MDP finite-horizon A . s z h with MDP General a is MDP infinite-horizon An , γ a z h s h rpriso au ucin n osiuethe constitute and functions value of properties ) + ∑ o t action. its for 1 ∈ S Tr fteplc sdpneto h iestep, time the on dependent is policy the If ie nato ae nasae r esti- are state) a in taken action an (i.e. ( s h + 1 | s h , : : a S S h × ) → V A local ( R s → h + unie h au of value the quantifies . Decision-Making 3.3 1 R features au fa cinin action an of value ) eit h quality the depicts infinite-horizon fasaeto state a of action γ (3.7) = 39 1

3 Chapter 3 Preliminaries

Similarly, state-action value function (also referred to as the Q-function) expresses the cur- rent reward and the maximal future state-action value, based on the transition probabilities:

Q(sh,ah) = R(sh,ah) + γ Tr(sh+1|sh,ah) max Q(sh+1,ah+1) (3.8) ∑ h+1 sh+1∈S a ∈A The goal of a MDP is to maximize the reward of the decision process. The latter is defined as expected cumulative reward received over all actions taken (see Definition 3.39). Definition 3.39 (Cumulative Reward for MDPs).

H CUM h h h R = E[ ∑ R (s ,a )] (3.9) h=1 For an inexperienced agent to learn how to act, the environment needs to be explored. Learning by interaction methods are referred to asRL, where the agent has to learn how to deal with partial feedback for sequences of actions.

Reinforcement Learning RL[126] studies how to learn to behave by interacting with the environment on a trial-and- error basis (see Definition 3.40). Definition 3.40 (Reinforcement Learning). Given a MDP,RL is the problem of an agent having to learn a policy π by taking actions a ∈ A in states s ∈ S and receiving numerical rewards R(s,a) in order to maximize the expected cumulative reward.

To this end, the most common way to classifyRL approaches is distinguishing between Policy SearchRL , Value-basedRL and Model-basedRL . We now briefly discuss each type.

Policy Search RL In Policy SearchRL (e.g. [141]; see [99] for an overview), one aims to directly learn policy π and thus omits to learn a value function. This is achieved by observing complete trajectories of a given policy (i.e. a predefined number of transitions s1,a1,r1,s2,...,sH ,aH ,rH ,sH+1) and then learning the model parameters of a policy. Simi- lar to contextual bandit algorithms, action selection strategies are directly integrated into the learning process. We present a Policy SearchRL algorithm in Sec. 6.6.

Value-based RL Approaches in the class of Value-basedRL (e.g. [140]) directly learn the value function for states (V) or state-action pairs (Q). Other than in Policy SearchRL, one needs an explicit action selection strategy (i.e. how to use the learned value functions with respect to the eventual policy) – e.g. the greedy policy, π(s) = maxQ(s,a), where the action a∈A with maximal state-action value is always selected. We present two Value-basedRL algo- rithms in Section 5.4& 5.5.

40 or RL ento 3.42 policy. Definition current the follows agent the which for In functions. ( 3.41 the defining Definition by begin We 3.2.2. Section with align ..dsigihn between distinguishing i.e. to samples more Model-based need For usually which [32]), case. (e.g. discrete exist spaces the converge. action while as to continuous [ 133], well constrained on spaces as works usually action Search RL, continuous are Policy to algorithms possible. adapted RL not directly Value-based is be values can state algorithms of RL values. Actor-Critic calculation state-action direct or but state- schema, the same calculate to 3.38) Definition For (see equations Bellman the use continuous are algorithms RL Model-based or uncertainty approximation. thesis. the model this measuring in MDP covered by current not approached the model usually in a is with error confused Exploration estimated be to not learning. is supervised and function, in transition and reward- the for approximations Here, techniques. function RL Model-based as to model MDP a by learned found we be that can or policy explored a been environment, have the actions of and states possible all that Given Planning be to has policy the but exists z given RL, Search Policy iin oudt prxmtosR approximations update to sitions ( s s h h agrtm eeatt u oki the is work our to relevant algorithms RL for dimension last the Finally, steoeof one the is MDPs in problem central Another stigtu eursaet ocnutimdaeudtso h respective the of updates immediate conduct to agents requires thus setting RL Online The agrtm swehrte elwith deal they whether is algorithms RL of property important an end, this To referred RL, Value-based and Search- Policy both using methods hybrid are there that Note − , a otnosato spaces action continuous Z h RL Actor-Critic BATCH , π r h RL. Search Policy for R , , s a n rniinfunction transition and h ato pcs In spaces. action MDP h + − 1 RL Batch Z ) BATCH oudt prxmtosR approximations update to BthRL) (Batch Oln RL) (Online model RL Model-based , r ) hc ed o elwt nti thesis. this in with deal not do we which 115]), [94, (e.g. algorithms h − ucinudtsaedfre o rdfie ubro iesteps, time of number predefined a for deferred are updates function , Z BATCH eest a to refers mod Online- . , n ol tl iceieteaalbeatosadflo the follow and actions available the discretize still could one , calculated s ie ac yeprmtrZ hyperparameter batch Given h . Z − BATCH Z Tr ie transition Given , BATCH iceeato spaces action discrete Q , RL Model-based for Tr and model MDP rmwihoecnderive can one which from )drcl er h reward the learn directly 87]) 64, [21, (e.g. methods + = RL Batch . 1 ,..., 0 . , Planning Q RL, Model-based for Tr s h ossigo h vial ttsa elas well as states available the of consisting , , a prahs h olwn entosthus definitions following The approaches. h RL Online , Planning r ( h s , MDP. the in h s , h a n a nmrt l cin and actions all enumerate can one , + h 1 , ) r h ttm tph s Z use h, step time at model MDP learned a where , setting. , , or RL Value-based for V s BATCH V h + and 1 ) ∈ ttm tph use h, step time at Q . Decision-Making 3.3 N Planning MDP with , paefrequency update o Value-based for V + n transitions and BATCH discrete π tran- for 41 or –

3 Chapter 3 Preliminaries

Definition 3.43 (Planning in MDPs). Given a model of MDP,M = hS,A,γ,Tr,Ri, planning is the problem to find a policy π by only using the M, without interacting with the world.

Value iteration [11] is a straightforward technique to derive an exact solution for MDPs with tabular state representations. The latter is inefficient for storing large state spaces (as states are enumerated), but describes the baseline for Planning in MDPs. The basic scheme is depicted in Algorithm1, where the Bellman equation for state values is iteratively used for a predefined number of time steps.

Algorithm 1 Value Iteration Require: M = hS,A,γ,Tr,Ri for all s ∈ S do V(s) ← 0 end for for all h = 1,2,...,H do use Equation 3.38 end for

• Line 1-3: Set the state values of all possible states to zero, which requires to know all possible world states.

• Line 4-6: Use Equation 3.38 to update the respective state value, i.e. update the latter based on the current reward and the maximal future state value.

Complexity Analysis 3.3. A single iteration of value iteration (i.e. one time step h) ends within O(|S|2|A|), as one has to calculate the current state value based on possible future states. To this end, sparse transition functions considerably reduce the complexity, where relatively few paths among states are possible. In general, value iteration needs more itera- tions to converge the higher γ is set, as we have to account for future states occurring after numerous transitions.

Important extensions of tabular representations are vector-based MDPs, where states are represented as fixed-length feature vectors, and factored MDPs [53] which use PGMs to ab- stract the state space. Until now, we assumed that a single agent learns to choose experts or actions. We now discuss the multiagent setting for sequential decision-making, where independent, cooperating agents have to take actions.

3.3.3 Multiagent Decision Processes In Multiagent Systems (MAS), multiple agents share control over the actions that are taken. This generates several advantages, namely robustness of the system due to multiple potentially

42 hr nec psd igesatstate start single a episode each in where sdfie as defined is (MMDP) Process Decision Markov Multiagent ( finite-horizon a 3.44 Definition MDPs. single-agent to the analogous to definition our restrict a We MAS. define we 46], Process [20, Decision problems Markov multi-step for decision-making multiagent in work as suitability to learn real-world and or tions logic, distribute modeled. to be ability can parties the various from to constraints due scalability agents, diverse { S poed nepisodes in proceeds MMDP a of protocol The stig scalnigbcueaet utlanto learn must agents because challenging is settings MAS in Learning λ Frec iestep time each For • LtS Let • Let • Ltrwr ucinR rniinfnto radhrznHb enda o finite- for as defined C be Let H • horizon and Tr function transition R, function reward Let • A Let • LtA Let • } λ s MDPs. horizon action. environment last agent’s another for query a of equivalent space –a xml would example an The vote.– MMDP majority action. respective global the the the be on yields dependent actions is environment function all aggregation of choice aggregate the where MDP , a in λ h ∈ , Λ z – – – – where , , Λ Λ λ COMM λ h on action joint The action environment their choose agents All states the of parts their perceive agents All fe eea omncto actions). communication several after actions communication choose to get chance the have agents All λ COMM eteaetspace. agent the be , λ eteevrnetato pc fagent of space action environment the be { etesaesaeo agent of space state the be A a bev.I ih edpneto ae omncto actions. communication taken on dependent be might It observe. can C λ COMM } communicate s ∈ Fnt-oio utaetMDPs) Multiagent (Finite-Horizon etecmuiainato pc fagent of space action communication the be λ 1 λ , N ∈ z Λ ≥ = , . 0 { s etecmuiainbde o ahagent. each for budget communication the be A h λ λ COMM , ,wihsuisbt oriainadcmuiainin communication and coordination both studies which (MMDP), ∈ z . a 1 ∗ ,..., scoe ae na rirr taey(..mjrt voting majority (e.g. strategy arbitrary an on based chosen is ihec te fcmuiaini ugtd ae nprior on Based budgeted. is communication if other each with } λ ∈ H Λ , with R finite-horizon , Tr s z λ , H srevealed. is H hc eemnswa ato h lblstate global the of part what determines which ∈ , C N COMM > 0 z h gnsatbsdo h urn state current the on based act agents the , ae u h nnt-oio xeso is extension infinite-horizon the but case, ∈ λ ) s . . a 1 hr a where , λ h λ h ,..., , , MDP finite-horizon a on Based z , z . Z with λ λ hr a where , ∈ A Z λ . Decision-Making 3.3 MDPs, for defined as ssmlrt naction an to similar is a coordinate λ COMM λ , z ∈ A , h λ COMM Multiagent ie bud- given hi ac- their 8 -tuple sthe is 43

3 Chapter 3 Preliminaries

h+1 h ∗ – Based on the joint action, the agents moves to the next states Tr(sλ,z |sλ,z,a ) h ∗ 3 – All agents receive the reward of the joint action, i.e. R(sλ,z,a ) for agent λ . As all agents have to cooperatively find a single joint action, the global value functions for ∗ a MMDP (i.e. based on the global state sz and joint action a ) are defined equally to MDPs (see Definition 3.37& 3.38). Each agent still has individual state and action spaces, where individual local MDPs could be instantiated to find a policy for the global MDP. The MMDP is defined in general terms, where the communication protocol as well as the reward conditions are left variable. To learn, for example, in MMDPs with perfect commu- nication (i.e. all agents have global state information),RL extensions for MAS exist (e.g. Q-learning for MAS[20]) which can be implemented for each agent, respectively. The sce- nario is referred to as joint-action learner setting.

Definition 3.45 (Joint-action learner setting for MMDPs). The joint-action learner setting studies MAS coordination with perfect communication. Given a MMDP, we set CCOMM = ∞. Note that CCOMM = |Λ − 1| might suffice to query all environment actions of all agents, given that environment actions have to be fixed beforehand.

Definition 3.46 (Independent learner setting for MMDPs). The independent learner setting studies MAS coordination with no communication. Given a MMDP, we set CCOMM = 0. For budgeted communication with resulting imperfect state information, one has to attempt to learn a communication protocol (e.g. [46]). We, finally, turn towards methods developed for the Semantic Web to set the baseline for situating prior decision-theoretic concepts into Web scenarios.

3.4 Semantic Web

The Semantic Web [14] describes the gradual goal of lifting the contents of the World Wide Web (WWW) with semantic annotations to make it machine-readable, where each modeled ele- ment is an uniquely identified resource. It consists of a set of standards and recommendations on how to build a feasible and adequate machine-readable Web architecture. The Semantic Web therefore reduces content ambiguity and enables rich queries to be issued against the Web. Central driving factors for the Semantic Web are a shared model (or syntax) to struc- ture data as well as vocabularies for generating annotations. We first deal with how data is represented in the Semantic Web.

3Note that one could implement a different protocol where, for example, all agents receive the reward of their environment action

44 6 5 4 3 2 1 ealsto to knowledge. enables adhere OWL individuals modeling, that to assuring addition In and relationships, and classes model tensively u 3.47 as Definition represented are graph the of elements all where data, derive to tation The Framework Description Resource 3.4.1 oe eea ttmnsaotpeiae n lse,adtermta relationships. mutual their and classes, and predicates about statements general model (RDFS) Schema RDF Vocabularies RDF 3.4.2 OWL. and RDFS namely expressivity, vocabulary. Friend a of Friend the is FOAF Here, name. assigned an has and person class illustrates 3.2 3.2 Example Example graphs. graph. RDF RDF model small to a language for TURTLE understandable the easily use and we natural work, a this In exist. alizations G triples RDF of set finite a is then b where til stu endas defined thus is triple RDF A object. an ∈ seri- RDF different Web, the across representation graph the share and persist publish, To i the is RDFS than expressive More higher with triples modeling for vocabularies two discuss now we FOAF, and RDF to Next foaf:Person; rdf:type :Patrick_Philipp < http://example.org/> foaf: @prefix rdf: @prefix : @prefix 7 6 5 4 ih eete ujc o bet rapeiae e etesto ln nodes, blank of set the be B Let predicate. a or object) (or subject a either be might U eoreDsrpinFramework Description Resource https://www.w3.org/TR/owl-features/ https://www.w3.org/TR/rdf-schema/ https://www.w3.org/TR/turtle/ https://www.w3.org/RDF/ ∈ a easbeto nojc.LtLb h e fltrl hr l where literals of set the be L Let object. an or subject a be can B TRL syntax) (TURTLE < Rsuc ecito rmwr,triples) Framework, Description (Resource ujc,peiae object predicate, subject, 6 vcblr ocntutsimple construct to vocabulary RDF a is ofnm PtikPhilipp". "Patrick foaf:name example. TURTLE 3.1: Listing . ( emdltosml rpe eciigta eorei of is resource a that describing triples simple two model We cesdo 05/01/2018 on accessed ⊆ e nooyLanguage Ontology Web U (RDF) ∪ ( reason cesdo 05/01/2018 on accessed es D rpeLanguage Triple RDF Terse B < > × s 4 triples ( , U cesdo 05/01/2018 on accessed enstebsceeet fagahrepresen- graph a of elements basic the defines p ( , bu lse n eainhp odrv new derive to relationships and classes about × cesdo 05/01/2018 on accessed o U > n sue ssae ytxt annotate to syntax shared as used is and ∈ ∪ ) U B nfr eoreIdentifier Resource Uniform ∪ ∪ L. . B ) ,where s, URI of set the be U Let × (OWL) U ontologies × ) U ) ∪ 7 (TURTLE) ealst ex- to enables OWL . B ∪ . eatcWeb Semantic 3.4 ealsto enables RDFS . graph RDF A L. ∈ a nybe only can L constraints 5 hc is which (URI). 45 .

3 Chapter 3 Preliminaries

We do not exploit any reasoning mechanisms in this thesis and solely perceive OWL and RDFS as means to model RDF triples. We thus do not specifically require nor exclude their usage, as models need to fit the respective domains and needs. Based on RDF and its TURTLE serialization, we now shortly discuss the related Notation3 language.

3.4.3 Notation3 TURTLE is a subset of Notation3 (N3) 8 which extends RDF with logical constructs. The log:implies predicate 9 is one such example, enabling to model implications of RDF graphs. Example 3.3 illustrates N3 with respect to implications.

Example 3.3 (Implications in N3 syntax). We model a simple implication for the foaf:Person example, where a FOAF predicate is mapped to its inverse predicate in the example namespace. 1 @prefix:. 2 @prefix rdf:. 3 @prefix foaf:. 4 @prefix log:. 5 6 { 7 :Patrick_Philipp rdf:type foaf:Person; 8 foaf:made :Figure1. 9 :Figure1 rdf:type foaf:Image. 10 } 11 log:implies 12 { 13 :Figure1 :madeBy :Patrick_Philipp. 14 } Listing 3.2: Example implication in N3 syntax.

Note that => can be used as synonym for log:implies .

We limit our discussion to implications, as we do not use other N3 constructs in this thesis. We next discuss how to query RDF triples.

3.4.4 SPARQL The SPARQL Protocol and RDF Query Language (SPARQL) 10 enables to query RDF triples. Its syntax is similar to the Structured Query Language (SQL) to query relational data bases. The basic concept to query RDF triples is a basic graph pattern (BGP), which extends RDF with variables.

8https://www.w3.org/TeamSubmission/n3/ (accessed on 05/01/2018) 9http://www.w3.org/2000/10/swap/log (accessed on 05/01/2018) 10https://www.w3.org/TR/rdf-sparql-query/(accessed on 05/01/2018)

46 11 10 9 8 7 6 5 4 3 2 1 sasbe fteSmni e,Lne aadaswith deals Data Linked Web, The Semantic data. tured the of subset a As Data Linked 3.4.5 be to have which Web. stores where Semantic triple (KB), or base files knowledge RDF structured file independent as time. simple several run files a at of RDF integrated be consist or might might stores latter latter triple The the to issued. refer be generally can we query a or the triples which RDF to containing graph RDF the the defines within used are BGPs Philipp’. ’Patrick named person a for resource the find to query SPARQL simple 3.4 BGP Example i.e. patterns, triple of set t a as is defined then BGP triple, RDF a to similar 3.48 Definition enx discuss next We hl the While ; foaf:Person rdf:type } ?s { WHERE FORM ?s SELECT foaf: : PREFIX : rdf PREFIX PREFIX cesbevathe via accessible URIs Use • things. for names as URIs Use • and RDF things. more as discover to such URIs other to standards, links Include using • lookups URI for information useful Provide • frame=single] SPARQL. names. up look ofnm PtikPiip . Philipp" "Patrick foaf:name SELECT TRL syntax) (TURTLE ikdDt Principles Data Linked BscGahPatterns) Graph (Basic ikdData Linked luedfie h aibe hc r ob eund h respective the returned, be to are which variables the defines clause WHERE rpestore triple SAQ example. SPARQL 3.3: Listing . hc ecie pca ls fsrcue aao the of data structured of class special a describes which , tils enwilsrt a illustrate now we triples, RDF modeled prior on Based yetx rnfrProtocol Transfer Hypertext qey The query. SPARQL the of clause i : = . ..agahbsddt tr.Truhu h thesis, the Throughout store. data graph-based a i.e. , r sfollows: as are [18] < e etesto l aibe.Atil atr is, pattern triple A variables. all of set the be V Let s , p : = , o { > t 1 ∈ ..., V t ∪ n } U . publishing ∪ B × oeal epeto people enable to (HTTP) V ∪ U and FROM × . eatcWeb Semantic 3.4 V interlinking ∪ U luefinally clause ∪ B ∪ struc- .A L. 47

3 Chapter 3 Preliminaries

The principles essentially describe how to structure Web data, such that the latter becomes useful for users as well as Web agents (e.g. automated Web services). This is achieved by using URIs to describe individual resources, which can be easily accessed, return structured data describing the resource and link to related resources. Based on the introduced concepts for the Semantic Web, we now turn towards Web services and how they can be lifted to structured representations.

3.4.6 Services in the Semantic Web A Web Service encapsulates task-specific behavior, where an input of arbitrary complexity is processed to generate an output. The service is then deployed on the Web such that it can be universally accessed. We restrict our perception of Web Services to information services [121] and subsequently define the latter (see Definition 3.49). Throughout this thesis, we perceive an information service as the equivalent of a solution function for a step (i.e. an expert) in a multi-step task, exposed as Web service. Definition 3.49 (Information service). An information service is a black-box with set of input parameters, which generates a hypothesis based on a given set of input parameter bindings. Input parameters refer to the needed inputs for the information service which are defined in an arbitrary service description. Information services thus return dynamically derived hypotheses which are calculated during the service call. Executing (referred to as querying) an information service must not change any state of any entity, i.e. no task instances or data points are being modified. Information services thus do not have any side effects.

With the rise of the Semantic Web, the idea to enrich available description languages and execution frameworks with semantic concepts came up naturally. Numerous approaches to Semantic Service architectures and -description languages exist (see Section 4.3.1 for an overview and a discussion). We focus on an instance of Semantic Web Services [121] called Linked APIs (LAPIs) [121, 122], where API refers to Application Programming Interface, corresponding to the access to a Web Service. Definition 3.50 (Linked API). A LAPI is an information service for which the following con- ditions must hold: • SPARQL graph patterns (i.e. BGPs) are used to describe inputs and outputs of the service.

• Use RDF as communication means by RESTful content negotiation.

• The output should explicitly provide links to the inputs.

LAPIs enable to encapsulate complex functionalities and make the latter easily accessible by exposing preconditions and postconditions as Linked Data. Here, conditions are modeled as BGPs, where the same variables should be used for preconditions and postconditions to

48 esn naseie syntax. content specified structured a send in to sent HTTP require be them, LAPIs call If correctly serializations. to RDF order specific in to link description. and its request or to LAPI respective the by HTTP provided content Web. enabled the is on communication RDF services outputs. generated and via inputs between relations the express ersnainlSaeTransfer State Representational GET eusscnte edrcl sdt ereestructured retrieve to used directly be then can requests mtosfor methods HTTP of use the promoting [42], (REST) POST or PUT gah can graphs RDF where used, are requests otn negotiation Content . eatcWeb Semantic 3.4 ee enables here, , 49

3

tl elwt partial a with step. deal each still for s EP executed s EP be both EP s. can in in experts all chosen actions not available be as all to problem, for needs label rewards action receives (iv) single one a steps, CDPs, While multiple and over problem. processes bandit decision contextual with multi-step chosen deal the are we rewards only Processes as but and Decision addition, executed, hypotheses Contextual are In expert experts of rewarded. all subset where is a [6], action only bandits (ii) contextual and need (iii) queried with we and be overlaps experts observed, can querying are experts for There all budget not a steps. where is inbetween there decisions since However, take to experts. queried (i) all frameworks. of decision-theoretic wards several to advice related expert is with and experts), (i.e. actions Advice Expert Multi-Step with Decision-Making Approaches Decision-Theoretic & Learning- ranking and 4.1.1 selection service on weights. We focus service EP expert then 7). the learn We & to proposing enable 5 techniques. by which (Chapter SAS techniques start experts related and for to problem weights approaches learn learning our well-established to the into algorithms EP s with situate RL first deal two thus we as well thesis, as this framework of II Part In Systems Single-Agent in Tasks Multi-Step 4.1 service available the to for regards systems. works with workflow related work and discuss our frameworks finally align decision-theoretic approaches We we descriptions, MAS where tasks. ), 4.3 between MAS (Section distinguish for problem approaches and automation MAS 4.2) with and (Section deal tasks MAS then SAS We in for works. tasks service-related multi-step and for learning works learn- between in the distinguish tasks for further work multi-step we related with where discussing begin by start and We problem advice. ing expert with tasks multi-step for lem In Work Related 4 Chapter hschapter this edsusterltdwr o h erig swl steatmto prob- automation the as well as learning- the for work related the discuss we , tde n-tpepr die hr eaeal oosretere- the observe to able are we where advice, expert one-step studies [27] ,a hyapoc the approach they as EPs, to related strongly are [77] (CDPs) decision-theoretic igeAetSystems Single-Agent hr n gn otosall controls agent one where SAS, a to maps adcompare and SAS of frameworks ugtdepr advice expert budgeted 4.1, Section in (SAS) Prediction [3], 51

4 Chapter 4 Related Work

Knowledge-based trust [35] extends knowledge fusion approaches [34], which combine the outputs of multiple triple extractors and is thus an instance of (i), where a combination func- tion over experts is learned based on heterogeneous data. While we do not deal with problem settings where input data might be flawed, our approach could be integrated into knowledge- based trust by providing accurate confidence measures for available triple extractors. To determine the accuracy of experts in an i.i.d. setting, [105, 106] use pairwise error independence measures between binary classifiers without requiring labeled data (i.e. they use unsupervised learning) and also fall under (i). We do not solely rely on unlabeled data, but leverage collective inference to integrate labeled and unlabeled data. In addition, experts are allowed to perform worse than average inEPs, which is not possible for the unsupervised case. The algorithm selection problem has widely been studied for combinatorial search prob- lems [75] or within automatic (AutoML)[41, 128], and was originally stated by [111]. It deals with weighting a set of available algorithms (similar to experts) based on available training sets, where to goal is not to execute all available algorithm configurations while learning. The algorithm selection problem thus is a special case of (ii) and (iii), as high- dimensional state spaces are dealt with and a budget is imposed during the learning phase. AutoML entails automating feature selection, algorithm selection and hyperparameter opti- mization and thus falls under (iv), as (in addition to algorithm selection) one has to deal with multiple steps. While, for example, used Bayesian optimization techniques share numerous overlaps with strategies to balance exploitation and exploration needed forEPs, we focus on guiding the decision process for single states, where a single state often entails to take a num- ber of decisions (referred to as decision candidates) – in NLP, for example, a single state or data point is a piece of text, where individual decisions might have to taken for each token. More specifically, the set of available hyperparameters in AutoML is assumed to be known in advance, which is not the case for decision candidates inEPs. While ensemble learning deals with learning functions over individual models, meta- learning [22] aims to optimize the application of experts to new data sets based on perfor- mance histories and meta-features describing the data sets. Both types of approaches can be seen as batch versions of (i). Our work, however, focuses on selecting experts for specific data points, which requires finer-grained expert weight functions and novel feature representations. We finally distinguish our work with respect to state representations used for modeling the decision process. Tabular representations are the most basic, quickly yielding large state spaces. To this end, Relational Markov Decision Processes (RMDPs) [38, 65] enable to fac- torize state and action spaces by allowing relational structures. The resulting representations are compact and can be used with relational extensions of prominentRL algorithms – broadly referred to as Relational Reinforcement Learning (RRL) (e.g. [37, 38, 65]) – or Planning al- gorithms, such as a relational lifting of value iteration [66]. InEPs, we also exploit relational state- and action spaces in terms of meta dependencies, which are pairwise expert features for single decision candidates, but our goals for the learning problem significantly differ from RRL. We exploit relational measures between experts and data points, where the goal state

52 ,weeaet r raie nheacia tutrssc that such structures hierarchical in organized a are agents of where function 33], combination [ 9, a (HRL) Learning by ment The integrated eventually process. model-based, learning are (i.e. the and scenario centralized up RL search) speed the policy on to or depend is agents value-based goal specialized the individual function, the of reward predictions complex a of parts ferent the within problems, multi-step For Tasks Single-Agent for Systems Multiagent 4.2.1 MAS to turn advice. subsequently and expert tasks multi-step single-agent . to for tasks related MAS Multiagent are to for approach which with our WeMAS deal compare systems. in specifically first real-world advances We not numerous with does for deal assumption adequate be additionally single-agent not now the might that and argue correlation we expert resolving , 7 & 6 Chapter In Systems Multiagent in Tasks Multi-Step 4.2 do and candidates decision individual for problems. parameter learning weights QoS multi-step learn for with dependent not optimize deal case, not do only our however, we in methods, as are, prior points, latter The All data specific measures. service of pairwise neighborhoods – on others among – by sessed [113]. cross-validation via sets, calculated data estimate to available accuracy problems on static are multi-step based a logs simplify by services historic to analytics where is combine trees, approach and decision simple select of a To rank use To [26]. the rules enable [57]. as also modeled settings latter The information limited [88]. rankings parameters retrieve for appropriate QoS to studied select used measures, also be to pairwise is can without way which measures services effective 148], service an pairwise [147, vein, services is similar of matrices a similarity In user [129]. services on based alleviate prediction To used. link is parameters QoS use of combination to weighted by way is ranking straightforward service A for parameters frameworks. decision-theoretic of To experts of Approaches combination Service or expert 4.1.2 which candidates). clear decision its not (and is state given it a for i.e. action correct priori, the a suggests known explicitly not is l ro ehd hr iiaiist u prah hr evcsaeidvdal as- individually are services where approach, our to similarities share methods prior All select and eaain fConcerns of Separations eaagent meta rank prpit evcs ifrn prahseitwihalfl ne (i) under fall all which exist approaches different services, appropriate h rmwr eeaie mn tes–Heacia Reinforce- Hierarchical – others among – generalizes framework The . centralized rmwr.B aigseilzdaet er dif- learn agents specialized having By framework. (SoC) ipeAdtv Weighting Additive Simple clustering ] 134 [82, proposed been recently has learning multiagent ehiussc as such techniques . ut-tpTssi utaetSystems Multiagent in Tasks Multi-Step 4.2 evc bundles service prerepresentations sparse ,weetepreference- the where 145], [5, k-Means Quality-of-Service n egttelatter the weight and a eapidto applied be can accuracy fusers, of (QoS) 53 .

4 Chapter 4 Related Work agents higher in the hierarchy select agents beneath them. In MEPs, we do not assume a cen- tralized meta agent to coordinate all agents, which requires agents to actively coordinate their actions. We additionally approach multiagent coordination in a designated decision process, where available agents can adapt their actions several times until the joint action is fixed.

4.2.2 Multiagent Systems for Multiagent Tasks A central aspect for cooperative MAS is the degree of information available to individual agents. In the independent learner setting (see Definition 3.46), agents attempt to coordinate without the knowledge of other actions and are not allowed to communicate [61, 62, 90]. In the joint-action learner setting ((see Definition 3.45)), actions have perfect knowledge about all taken actions [25, 29]. Here, the joint action is rewarded for all agents, but penalization might occur if no consensus was reached. For the joint-action learner setting, managing agent communication is a central problem, where communication can be modeled as separate ac- tion space (e.g. [46]). In this work, we assume the joint-action learner setting and assume communication is either assumed to be perfect (e.g. due to a centralized controller who only distributes information) or has already taken place in an optimized manner to yield perfect information. Approaches to decentralized Multiagent multi-step coordination problems [20, 83] par- tially extend MDPs by partitioning the action space according to available agents, resulting in MMDPs (see Definition 3.44). Distributed variants of Q-Learning are often used to learn both independent as well as joint-action policies [25, 83]. In general, MAS involving autonomous coordination among cooperating agents have been studied for different domains and tasks, such as work load allocation [74] or goal searching robots [146]. In a similar vein, Multiagent Relational Reinforcement Learning [30, 107] extends RRL to the multiagent case and thus enables speeding up the learning process by compact relational state and action spaces. An essential difference in MEPs is the possibility of active coordination, which is an additional decision process to enable agents to react to the current weighted majority vote. In order not to lose expressiveness, active coordination assumes continuous action spaces which enable agents to adapt the weights for their experts. Here, policy-basedRL techniques often work better than value- or model-based ones, as convergence is usually faster. Swarm intelligence approaches for decentralized Multiagent coordination [146] focus on aligning single-agent strategies with the current weighted majority vote. The goal is to exploit high local agent weights to influence non-informed cooperating agents (i.e. agents with low local weights). Active coordination, as pursued in MEPs, is directly inspired by swarm in- telligence, as agents either persist on their experts’ strategies or follow the weighted majority vote, thus aligning their actions. We, however, extend techniques proposed in [146] to take into account the current context (i.e. available state information and pairwise agent behavior) as well as to learn policies in aRL setting. From a Web service perspective, MAS involving autonomous coordination among coop- erating services [1, 54, 116] work towards self-organization, where Web architectures, agent

54 05/01/2018 precondition. a and list delete a list, add an of consist to actions requires STRIPS states. given for The Applications & Frameworks but Decision-Making work our 4.3.2 with compete not does it metadata. provenance executions, be generated LAPI might our of extends OPMW potentially outcomes As the document metadata. provenance to workflow-related used describe to elements the essential on based is which ontology, [50] (OPMW) descriptions. the service on structured We and for BGP s, [123]. elements as by modeled out are Model postconditions pointed con- Service and as Minimal with preconditions mappings SAWSDL where input/output LAPIs, extends on inexact which rely from [137] thus suffer WSMO-Lite – to effects similar and – evenditions models and complex services require both simpler However, for descriptions. expressive more model to end, enabling this To outputs. and inputs between the relationships functional describe to WSDL-S neglects on however (based overview).WSDL complete to a extension for tic [135] machine- (see towards work step our important to closest an the the depict restrict are latter we and the overview), readability as an descriptions, for semantic 110] [ to see [28]; comparison (WSDL) Language covering Description works Service of Web body large a is there While Languages Description Service Web Semantic 4.3.1 advice. expert multi-step for problem automation (iii) Web and (i) decision-making the into with works deals related divide thesis We this of III Part Web the on Tasks Multi-Step 4.3 develop to self-organize. is to system- problem agents enable learning on which the put estimation, for weight work is expert our however, for of focus, methods idea supervised The central The developed. aspects. composition are and protocols communication and types lsl once osrielnugsi the is languages service to connected Closely Language Description Service Web for Annotations Semantic rbe solving problem 3 2 1 e evc oeigOntology Modeling Service Web http://open-biomed.sourceforge.net/opmv/ns.html http://iserve.kmi.open.ac.uk/ns/msm/msm-2014-09-03.html https://www.w3.org/Submission/WSDL-S/ tnodRsac nttt rbe Solver Problem Institute Research Stanford ) ..afra agaet ecieatosadapanrt hoeactions choose to planner a and actions describe to language formal a i.e. , ecito agae[73] language description (MSM) oko systems workflow ecito languages description OWL, on based are [89] OWL-S and [112] (WSMO) . 1 ,which WSDL ), for extension semantic early an , o-eatcWbsriedescriptions service Web non-semantic pnPoeac Model Provenance Open opie l functionalities all comprises [43] (STRIPS) ( cesdo 05/01/2018 on accessed pnPoeac oe o Workflows for Model Provenance Open o e evcs (ii) services, Web for 2 ihwih nooywt base with ontology lightweight a , . ut-tpTsso h Web the on Tasks Multi-Step 4.3 ( cesdo 05/01/2018 on accessed saseman- a is 72] [ (SAWSDL) ) (OPM) frameworks ( cesdon accessed 3 n provides and ) eg the (e.g. for 55

4 Chapter 4 Related Work

By adding or deleting state conditions, one can express impacts of actions. STRIPS is the baseline for multiple decision-making methods, such as Problem-Solving Methods (PSMs) or MDPs, which we now integrate into our work. PSMs [12, 108] are closely connected to STRIPS as well as to our work. PSMs describe highly parameterized algorithms which, in conjunction with expert knowledge, are able to solve real-world tasks. Similar to our work, PSMs are described in terms of functional spec- ifications, requirements and operational specifications (i.e. the preconditions and postcondi- tions). To this end, the Unified Problem-solving Method Language (UPML)[40] enables to graphically model and semantically describe PSMs, which supports both reusing and adapt- ing available PSMs. While several PSMs can be easily composed and combined to solve multi-step tasks, the resulting learning problems are not dealt with. Usually, Planning (see [55] for an overview) involves finding a policy which suggests the optimal action for each possible world state. For tasks involving uncertainties with respect to action outcomes, stochastic STRIPS enables to assign probabilities to actions. To this end, MDPs deal with learning the parameters of stochastic STRIPS tasks, which we use as decision-theoretic baseline for our work. Planning techniques have been applied to Semantic Web services before (see [70] for an extensive overview). Approaches to enable dynamic orchestration of Semantic Web services comprise – among others – Hierarchical Task Networks (HTN)[69, 118] or OWL reasoning [117]. MDPs have been applied to compose non-semantic Web services [36], but the reward function was solely dependent on static Service Level Agreements (SLAs). There are, to the best of our knowledge, no works for Semantic Web services and MDPs to deal with dynamic rewards based on task instances, as is the case for multi-step expert advice. We thus introduce SEPs which extendEPs with semantic annotations and allow to further distinguish among abstract and grounded tasks. To this end, it is important to point out that SEPs share several similarities to an extension of RMDPs to multi-step expert advice on the Semantic Web. Here, Logical Markov Decision Programs (LOMDPs) [65] assume manually modeled state- and action spaces, which might consist of numerous abstract or grounded relations. Logical rules are then used to define (stochastic-) transitions and assigned rewards. In SEPs, semantic experts can be interpreted as to providing similar STRIPS-like rules, consisting of RDF triples. We show that, equal to LOMDPs, SEPs can be reduced to discrete MDPs but do not exploit relations to speed up Learning or Planning. Using relational MDPs, such as LOMDPs, could thus improve planning in SEPs if one can guarantee a direct mapping to the former. We discuss this challenge in detail in terms of future works of the thesis (see Chapter 11).

4.3.3 Workflow Systems There is ongoing research in workflow systems which enable to describe and execute experts (and expert services) of different kinds. Pegasus [31] is a workflow management system able to map abstract pipelines for simulation data analysis to distributed computing environments.

56 nesv itiue optn plctos ial,Wbsriewrflw r approached are workflows the service Web by Finally, applications. computing distributed intensive otx rpoes n hi btatrpeettoscnb hrdars h e through Web the another across in shared reused be be can can representations components abstract their The and process, processes. or real context of OPMW. instantiation by annotated dynamic be Weenable might which workflows. experts, semantic test by and covered execute not also to is we sets which end, training this tasks, on To step rely problems. multi automation for environmental and optimization on learning with based our are deal approach which to MDPs, also enable on We and work generation. feedback our on provenance build based for but ontology sources techniques OPMW Planning data mentioned employ and prior components uses and matches con- requests and automatically user conditions also specifiy framework to The require descriptions but straints. semantic Learning, Ensemble Generic by environments. experts combining distributed support large-scale in components of reusing sition from benefit greatly projects. could prior architecture of Our services of NLP use uncertainty. annotated the under semantically in plan differs and but combine Galaxy, to and Taverna both with overlaps JSON-LD using described semantically successfully been the and services for well-defined Similarly,used on based experiments optimize points. workflow to reproducible data learning and single supervised using for to selection regards the with where Taverna queries, from SPARQL diverges constructing equals architecture then our selection Service services. (NLP-) the describe in auto- used and successfully sources the was Taverna distributed capture from users. project for data better process integrate to creation to com- workflow used able new the also are mate of is Taverna RDF integration scientists. in the the modeled of easing view descriptions thus semantic and Lightweight interfaces service ponents. generic creating by totyping exploit or candidates) decision (i.e. points services, data single compose annotations. for semantic to decision-making approaches on focus sophisticated not are do systems but workflow mentioned All tutions. computing. distributed for especially grids. streams, of data use for Makeflow the workflows enable easing to thereby framework XML, based via directly or interface computers. vein, similar a In to OWL , using formalized models, domain as created are workflows abstract [144], In around centered work The Taverna 5 4 6 7 http://www.panacea-lr.eu/ http://langrid.org/ http://www.lappsgrid.org/ http://json-ld.org/ LanguageGrid 5 to RDF using of terms in Taverna to related strongly is work Our tasks. NLP for nbe odfiewrflw ae on based workflows define to enables [2] ,o h te ad sasinicwrflwsse uprigpoespro- process supporting system workflow scientific a is hand, other the on [97], AKSALON AP rdproject Grid LAPPS FireWorks project nbe odfiegi oko plctosvaagahcluser graphical a via applications workflow grid define to enables [39] ( ( cesdo 05/01/2018 on accessed cesdo 05/01/2018 on accessed oue nealn okost eeeue nsuper- on executed be to workflows enabling on focuses [59] 4 eatcworkflows semantic eprsaeitgae rmvrosinsti- various from integrated are experts NLP diverse where , 6 srie aebe implemented, been have services NLP numerous where , ( ( cesdo 05/01/2018 on accessed cesdo 05/01/2018 on accessed 7 n ulse nteWb u oksae many shares work Our Web. the on published and Galaxy ) ) ist nbeteatmtccompo- automatic the enable to aims [52] nxMake Unix ) ) nbe ocnutsustainable conduct to enables [19] . ut-tpTsso h Web the on Tasks Multi-Step 4.3 hc r utbefrdata- for suitable are which , dispel4py eatcmt components meta semantic saPython- a is [44] PANACEA 57

4 Chapter 4 Related Work

OWL classes. The authors rely on triple patterns to select appropriate annotated components for steps, but only exploit owl:subClassOf relationships to enable generalization. The approach thus does not deal with uncertainties. With SEPs, we are able to use lightweight se- mantic annotations (in contrast to OWL), while also generalizing the performance of experts usingRL, assuming available training sets. Finally, automatic composition of analytical workflows is studied in [16]. The system es- sentially uses a planner, a leaner and a largeKB to solve tasks. A large amount of potential workflows are taken into account to answer a user specified query with the optimal choice. The decision process comprises complex Learning and Planning approaches, and entails ex- ploring large feature spaces. Lastly, atomic actions are lifted with semantic annotations to better adapt to user queries. The proposed system, similar to our work, assesses experts in terms of labeled data and uses a planning engine to compose experts given a user query. The system, however, does not enable flexibility with respect to available learners and planners. In addition, weights for experts are not adapted for individual decision candidates, as the system relies on global ensemble functions. Finally, central properties for optimizing multi-step tasks are ignored, such as the impact of current hypotheses on future steps.

58 Part II

Learning

This part of the thesis deals with the learning problem, where we deal with research ques- tions about choosing and combining experts for multi-step tasks in order to maximize the number of correct solutions. We begin by defining the central framework of this thesis –EPs – and subsequently present twoRL approaches for learning expert weights (Chapter5). We then present an extension ofEPs to the Multiagent setting – i.e. MEPs – where we present methods to coordinate expert weights in order to reduce expert correlation (Chapter6). We evaluate our approaches for bothEPs and MEPs in Chapter7 based on the NERD scenario.

fradtie icsin.Sil oeo h rmwrscvrtepolmof problem the cover frameworks the of none Advice (see Still, Expert CDPs Multi-Step discussion). and with bandits Decision-Making detailed contextual a advice, for expert 4.1.1 be budgeted Section to advice, needs expert with which are prediction correct, step ing is last solution the eventual but weights. the each their as learning in when long experts account as so, into more errors taken Even small make output. to expert’s allowed an of correctness the on a in perform experts available experts of queried subset the task. of a hypothesis multi-step correct choose the weight mistakes the solve (i) find to of to (ii) to needs supports to number which and order one average budgets expert in end, the with points this deal Here, to To data experts individual points. distributions. for multi- data data experts a available different for exchangeable of for all hypotheses step differs wrong of generate significantly single subset often often a small experts for a individual experts least as exchangeable difficulty, at its of as highlights such availability such applications, task The real-world platforms step complex provider tasks. for service them multi-step compose via as and available reuse become to experts enabling Algorithmia, of amount increasing An Introduction 5.1 the evaluate empirically we 2, & 7. Chapter 1 in Hypothesis framework) reference EP confirm Our the To which thereby weights, (and expert algorithms. methods for expert RL learning spaces learning for feature for relational used methods target be we two 2, can introducing Hypothesis by In 2 EP s. this Question To in weights Research central. is approach problem maxi- expert also exploration-exploitation hypotheses global we the expert learning end, that weighted enable and and performance to combined overall need that the such we mize points that data 1 individual Hypothesis for expert in weights with state tasks multi-step respectively We for challenges advice. capture to model decision-theoretic appropriate an chapter This Processes Expert with Learning 5 Chapter Well-established nadto odaigwt xhnebeeprs eas aet aeit con how account into take to have also we experts, exchangeable with dealing to addition In eln with dealing 1, Question Research to solution as framework EP the proposes o hscatri [100]. is chapter this for decision-theoretic sequence ssih hne nipt fe aesgicn influence significant have often inputs in changes slight as , rmwrsprilyda ihteepolm,includ- problems, these with deal partially frameworks tonce. at 61

5 Chapter 5 Learning with Expert Processes

Figure 5.1: Multi-Step Expert Advice in an Expert Process with a) the formal process and b) a NLP example for NERD.

In this chapter, we define theEP to bridge this gap.EPs enable to learn fine-grained expert weights by considering the impact of current decisions on future experts, while not being overly restrictive in rewarding outcomes. Figure 5.1 a) illustratesEPs in terms of their schema. EPs extend MDPs (see Definition 3.34), which model the general sequential decision-making setting. In anEP, world states (variable sh) belong to a single step and are partitioned into decision candidates (variable dsh), which are distinct parts of the world state the agent has to interpret. Actions (variable dah) for decision candidates can only be generated by available h experts. The agent has to query a subset of experts (variable ei ) and then choose among the hypotheses of experts, after which it receives a reward and perceives the resulting state (variable sh+1).

Based on the schema of anEP, we now discuss the illustrated NLP application to step NER (see Figure 5.1 b)).

62 h eta hlegsapoce nti hpe r umrzdsbeunl n ee to refer and subsequently 1.2. summarized Sec are in chapter defined challenges this the in approached challenges central The Challenges 5.1.1 steps, certain set for the representation as feature important, as is use NED. to This e.g. intractable directly. is outputs candidates is their decision possible using learning of than supervised other and learners features among relations relational EWH Combining use the to as extend inference. related we well collective where as for approach, updates 67] RL weight [24, Batch expert a dependencies efficient meta and use for Online- We algorithm an points. for data representation individual for state for generalization as models limiting assuming i.i.d., from stem be for partially to problems distributions, data These data images. heterogeneous or measures text for Relational for problems available example candidates. generalization overcome decision to for promising experts are of performance predictive the query quantify best to how learn NED to queries has then agent agent the the as experts, states, for (weighted) budget experts. possible the all Given on reward. keep based a to experts gets states, has it agent resulting which the and , KB for actions given action all a over in distribution resources probability to entities a named from mappings possible find produce and basketball expert) a single basketball." a tion plays to Jordan query "Michael the a limits action which budget state. e predefined the of) (parts possibly for experts a NER available query to needs exem- agent an an – from, entities choose named to action available find s to be has might state and plary text comprising states perceives agent an 5.1 Example 1 ( oti n,w osrc eainlfauesae,rfre oas to referred spaces, feature relational construct we end, this To to experts NED query to needs still agent the and NERD of step last the not is NER As eprse experts NER generic two assume us Let s 1 A urigeprsotnetiscss n sal osntwn oeeuealarge a execute to want not does usually one costs, entails often experts querying As • = ) ml usto vial xet)mgtrsl nmsigfebc ( so, feedback more missing a in (or Even result expert single might steps. experts) a later available querying as of learning, well subset supervised small with as experts current assess result the to might for learning expert when hypotheses single unusable a or only wrong choosing However, in experts. exchangeable of number 2 1 {( = tce generalization stacked <>ihe odn/>pasbasketball." plays Jordan "Michael = E xml o tpNER) step for example (EP McalJordan" "Michael {( McalJordan" "Michael 1 = McalJra ly basketball." plays Jordan "Michael )}. ,weetecnrldfeec susing is difference central the where ), 3.26 Definition (see , , eMcalJordan Michael . eMcalJordan Michael where NER , is NERD task multi-step for 1 Step 1 , e 2 hc r ohqeid(ignoring queried both are which rbblsi otLogic Soft Probabilistic and eadependencies meta ed ocos single a choose to needs ..e i.e. , ..e i.e. , ),( hleg 1 Challenge } n e and )}, "basketball" ofidpossible find To . . Introduction 5.1 2 ugssac- suggests 1 suggests 2 which , ( ). s (PSL) 1 = ) 63 ,

5 Chapter 5 Learning with Expert Processes

• Local optimization for individual steps often fails when experts perform better with inexact inputs (Challenge 2).

• Generating training samples for learning expert weights requires experts to be executed on training sets. For multi-step tasks, random execution of experts is potentially harm- ful, as experts of later steps are executed in an uncontrolled manner (Challenge 3).

• When labeled data sets are small, it is difficult to engineer or learn well-working feature representations for unstructured data (such as text or images). (Challenge 4).

5.1.2 Contributions Based on these challenges, we now summarize our contributions.

• We provide a formalization forEPs, aligning them with prominent decision-theoretic frameworks which allow to deal with multi-step problems, where feedback is only par- tially available.EPs enable to solve multi-step tasks with expert advice withRL algo- rithms (Contribution1 ).

• We define and use meta dependencies as feature representation for learning, which en- able to integrate other, potentially high-dimensional feature representation as well as exploiting pairwise expert behaviors (Contribution 2.1).

• We present twoRL approaches, namely OnlineRL based on the EWH algorithm and BatchRL using PSL. The EWH algorithm scales well, even when a large number of experts are available. PSL, on the other hand, is able to exploit meta dependencies with respect to pool-based prediction as well as collective inference (Contribution 2.2).

• We instantiate our approaches for task NERD and empirically evaluate it against - lithic real-world systems as well as available expert combination approaches in Chap- ter7.

The chapter proceeds as follows: We first formalizeEPs (Section 5.2) and introduce meta dependencies for experts and data points (Section 5.3). We then present an OnlineRL and a BatchRL approach to solveEPs in Section 5.4& 5.5. We finally summarize this chapter in Section 5.7.

5.2 Problem Formalization

Given a finite layered multi-step task (see Definition 3.2& 3.4), we now define a decision process given the introduced challenges for multi-step expert advice. More specifically, we now introduceEPs which extend MDPs with the availability of decision candidates, experts as well as less restrictive budgets.

64 ( ento 5.1 Definition S , E LtC Let • H Let • Tr Let • Let • Let • Let • e expert an where experts, all of set the be E Let actions. • possible all of set the be A Let • S where states, all of set the be S Let • LtR Let • w Let • , A pcs osbeactions Possible spaces. ( a candidate. action decision choosing each for after feedback received contains feedback local of vector Tr task), multi-step finite layered a of definition the on (based state. a for action expert’s AR state Each otherwise. s or for graph vector, sequence, set, a as s e.g. form, any s in state represented A spaces. state heterogeneous ns in action suggested the return to function expert e the expert use we of notation, of abuse slight With ehd swl sainetwt ok nohrfils uha rdcinwt expert with prediction as such fields, MDPs. other or in bandits works contextual with advice, alignment as well as methods e h , h γ a e fD of set a has , , h ⊆ γ Γ π s Tr + h h . e EXP ∈ h 1 : stetr tt itiuin i.e. distribution, state tart the is ) S : ∈ , . : S : h S R n salwdt ur nse h. step in query to allowed is one [ S 0 N S h × → , , ∈ h H → × + 1 h S × Epr Processes) (Expert ] eteplc,wihmp ttst actions. to states maps which policy, the be A , N h A etenme fsteps. of number the be o tt s state for C R etedson atrhyperparameter. factor discount the be + A h + EXP 1 h → h xetwih ucin hc unie h xetdqaiyo the of quality expected the quantifies which function, weight expert the nato eainfrmpigdcso addtsbtentostate two between candidates decision mapping for relation action an etebde e tp tcnnstenme fsaeepr pairs state-expert of number the confines It step. per budget the be h → ) eiincandidates decision . [ 0 S , d h 1 + A h ] 1 D h ..e i.e. , d h r acyclic are EPs As function. transition deterministic the be : a = eterwr ucin h eadi neioe r episode, an in reward The function. reward the be h j ∈ { d d s h A h . : + h S 1 adi a is and MDP a extends EP An for |∃ h → e = h Γ h d 1 stu nqeyasge oase n a be can and step a to assigned uniquely thus is s ∈ s ∈ S ,a tessdmntaino u proposed our of demonstration eases it as A, h h 1 ,..., ∆ r enda follows as defined are E ∪ ( : S S ( 1 2 s D ) ∪ h . h , s ∈ ... s h h ..D i.e. , + E ∪ 1 h ) S safnto e function a is ∈ H + e h h 1 h . rbe Formalization Problem 5.2 ( eiin edt etaken be to need decisions nsaes state in ossso H of consists s ( h s ) h } , a h ) ilawy result always will h h ..tevector the i.e. , : S h + → 1 disjoint, Rwith AR 8 z h -tuple sa is , (5.1) 65

5 Chapter 5 Learning with Expert Processes

Figure 5.2: Schema ofEPs in the MDP framework.

Figure 5.2 illustratesEPs in theRL framework, extending Figure 5.1 with a feedback loop. It illustrates a single step of anEP, where the agent observes the current state sh, has to choose h h h h+1 among experts e1,...,eN in order to find action a , and finally observes the new state s and receives reward rh+1. Note that respective states consist of decision candidates dsh, which are omitted in the figure. The EP protocol of anEP proceeds in episodes z ∈ Z, where for each episode z the process goes through all steps. In each step h for h = 1,...,H:

• We are given Sˆh (dependent on episode z).

h h • Build state-expert pair distribution PEXPERT(s ,e ).

h h h h h • Choose and query CEXP state-expert pairs (s ,e ) and retrieve a1,...,aM.

h h h ˆh ˆh h d h • Build state-action distribution PACTION(s ,a ) based on Sˆ and A , with A = {a | a ∈ d h h h A }, containing all possible actions based on a1,...,aM. • Choose (sh,ah) and receive R(sh,ah), but observe all residual rewards, i.e ∀(sˆh,aˆh) : R(sˆh,aˆh) 1.

• If h < H then set Sˆh+1 = {sh+1|sh+1 ∈ Aˆh}.

We now define the value functions forEPs, which express optimal characteristics of states and actions. The state-action value function for a MDP (see Definition 3.38) expresses the ex- pected value of a state and action by considering possible future states and actions. We express this condition via expert weights, such that the state-value function is defined as weighted ma- jority vote of experts.

1Feedback might be available in retrospect at h = H + 1.

66 V estimate. 5.2 Definition xetwihsfrtersetv tt.Teitiini htasaehshg au fnumerous if value 5.3 Definition high has state well. a that perform is to intuition estimated The are experts state. respective the for weights expert a action an taking for a h ugse cinadtesmo aia -auso cin o h eutn states. resulting the for actions of Q-values maximal of sum the and action global suggested find the to need we 5.4 as Definition task, multi-step a solutions. for task essential to is respect This with optima actions. and states future for A 5.5 Definition The value. state-action the on P ihfiiemlise ak o h erigpolm h icutfco sfie,i.e. fixed, is factor discount the problem, learning the for tasks multi-step finite with ACTION h → : . h tt au ucinP function value state The where The sa xetwih hudielyepestecretrwr urn state current reward current the express ideally should weight expert an As ie h tt-cinvlefnto,w en the define we function, value state-action the Given A where S [ → 0 w , tt au function value state 1 ( e R h s S ] ˆ ( h h sdfie stenraie tt-cinvlefnto estimate. function value state-action normalized the as defined is d h d sdfie sfollows: as defined is , s s stesto osbestates. possible of set the is a h h h = = ) ) s) EP in distribution State-value function-, value (State EPs) in function value (State-action EPs) in distributions (Probability s) EP in weights (Expert hc xrse h rbblt o naction an for probability the expresses which , { a R h ( | d a s h h h ns in ∈ , e h A Q P (( h h ACTION ( STATE saaoosydfie.Hr,w nyageaeterespective the aggregate only we Here, defined. analogously is sdfie sageaino xetwihs hc ugs action suggest which weights, expert of aggregation as defined is d s : s h tt au distribution value state e h , h )+ )) a ( V h d : = ) ( ( s S s s h h γ h ) → = ) , a ∈ d a h s ∑ . ∈ h ∑ ∈ [ a 0 = ) A s d nepr egti endvatelclrwr of reward local the via defined is weight expert An h h , d h s } ∑ 1 s e h h ∑ ∈ ∈ ] Tr stesto l osbeatosad sw deal we as and, actions possible all of set the is E ∑ s sdfie stenraie tt au function value state normalized the as defined is h h s e ( w 0 ∑ h ∈ s ∈ . e h S ˆ E ( + h h tt-cindsrbto P distribution state-action The . d Q w ∑ 1 s h tt-au-ucin Q state-value-function, The | ) d ( a s e saaoosydefined. analogously is s 0 h ∗ ∈ h h , ( A 1 , a d h a { h s Q h e ) h ( ) ) a d ( h s s + )= max 0 1 a , ∈ a d ob hsni state in chosen be to a A 0 h ) h . rbe Formalization Problem 5.2 } + 1 tt-cindistribution state-action . Q h tt au function value state The ( s h + 1 , a h + and 1 : ) ACTION S × γ o value for s A = based , → : 1 (5.5) (5.2) (5.3) (5.4) . S R 67 × ,

5 Chapter 5 Learning with Expert Processes

h h V(s ) PSTATE(s ) = 0 (5.6) ∑s0∈Sˆh V(s ) h h Finally, the distribution over (state,expert) pairs PEXPERT(s ,e ) is constructed based on normalized expert weights for a single decision candidate.

Definition 5.6 (State-expert distribution inEPs) . The state value function PEXPERT : S × E → [0,1] is defined as the normalized expert weight for a state.

d h h ∑d s∈sh weh ( s) PEXPERT(s ,e ) = d 0 (5.7) ∑s0∈Sˆh ∑d s0∈s0 ∑e0∈Eh we0 ( s ) The reward of the decision process RCUM expresses the expected cumulative reward, similar to finite-horizon MDPs (see Sec 3.39) extended with decision candidates. Definition 5.7 (Cumulative reward for EPs). The overall reward of anEPR CUM : S × A → R is defined as:

H Dh CUM h d h d h R = E[ ∑ ∑ R ( s , a )] (5.8) h=1 d=1 The goal in anEP is to learn how to act in the given decision process, i.e. finding the optimal policy π∗ which maximizes RCUM and, therefore, the number of correct solutions for the respective multi-step task. Learning to balance exploration and exploitation, appears in EPs in terms of learning which experts to query for which states, which can be approached withRL. It is important to note that we do not assume anEP to be realizable. As a consequence, it is possible that none of the available experts returns the correct action for any state configuration. This makes learning more difficult, as one searches for unavailable positive rewards for the available actions. In this chapter, we are interested in learning weights weh for experts, constrained by a bud- h get CEXP. This entails continuously or periodically improving policy π while collecting sam- ples (sh,eh), as the sample collection process directly influences the quality of the learned weights and vice versa. For learning weh , we exploit available state spaces by calculating meta dependencies for experts and decision candidates. We now introduce the concept of meta dependencies, which is our feature representation for learning expert weights.

5.3 Meta Dependencies

To generalize reward signals for expert hypotheses across states, we need to find an appro- priate feature representation for the latter. Before introducing meta dependencies, we discuss straightforward feature representations for NLP tasks.

68 neighborhood isaetasomdit rbblte ycniinn h eiincniae ntesorted define the thus in We candidates measures. decision ψ behavioral the given conditioning a by for probabilities neighborhood into transformed are cies E andpnetkernel dependent main follows. as defined is which neighborhood, sorted a for behave experts of well. cooperate experts which quantify to expert experts and embeddings) word for unavailability entry against dictionary robust missing a is dimensionality. to the kernels due reduces such (e.g. of values variety feature individual a of Using representations. feature able neighborhood. kernel-induced modeling also candidates. while decision single embeddings), on for vector experts based of spaces as behaviors feature (such pairwise low-dimensional representations model (see intro- feature can classification thus available we relational We diverse dependencies, on meta converge. based With not features 3.21). relational might Sec are process which learning dependencies, the meta as duce problematic, are spaces feature no yields which vectors, embedding trained respective decisions. some the sequences for of token features or dictionary available tokens the possible within to bet- all suitable significantly be not However, be be might smaller. might significantly might it sizes generalization end, vector rather where this and are texts, To ter for sets often. embeddings training appear vector not available learned do most ap- single shingles exploit same that use or the Considering grams to follow words, entails most level. k-shingles small, BOW character and word n While length a on enumerated. on words proach be sequential to all retrieve need ex- n-grams all shingles words, where or vectors, grams high-dimensional or words, as n-grams represented pected (BOW are ), bag-of-words features resulting be The would sentences k-shingles. or paragraphs for representation sible 5.2 Example h SINGLE h xetbhvo sdfie by defined is behavior expert The eoefrhrdfiigmt eednisfrcss()(i) edsrb rprisof properties describe we (i)-(iii), cases for dependencies meta defining further Before ihmnmlsimilarity minimal with dependency meta a specifically, More candidate among relations define can we dependencies, meta With A large hire, to costly are experts domain and size in limited usually are sets training As → eadependency meta [ e 0 h e , : hl ii goe h rsneo te xet,()ad(i r ariemeasures pairwise are (ii) and (i) experts, other of presence the ignores (iii) While . i h 1 S , ] e h h rpiso experts of pairs or j × ftesm tp i)toexperts two (ii) step, same the of Eepayfauesae o NLP) for spaces feature (Exemplary N E κ h , δ → ( d N s [ 0 i h κ eiincandidate decision ) , κ , δ h atri osrce o niiuldcso addtsadcon- and candidates decision individual for constructed is latter The . 1 vrgsanme fepr-eedn efrac erc na in metrics performance expert-dependent of number a averages ( ∈ ] d n ar fexperts of pairs and δ s K i h κ = ) with ∈ β Kernels [ 0 PAIR { l , c κ 1 s h ] j : h otdnihoho scntudbsdo do- a on based construed is neighborhood sorted The . | eairlmeasures behavioral : c S s S h h j h r iiaiymaue,wihaebsdo avail- on based are which measures, similarity are MD × ∈ × d S s S i h E h κ h , yvstn iia eiincandidates decision similar visiting by h → l : ψ × : κ e PAIR E [ ( i h 0 E d , × . , s e 1 → i h h pos- a NER, as such NLP , in tasks For j : S ] , + . c S 1 → s [ h 0 h j fsbeun tp r(i)asingle a (iii) or steps subsequent of × ) , [ 1 0 ≥ E o igeexperts single for ] , h eutn eadependen- meta resulting The . 1 h δ ] × κ vrgshwsnl-o pairs or single- how averages conditions ∧ E i → 6= . eaDependencies Meta 5.3 [ j 0 d } , s 1 h ] . n ihr()two (i) either and o igeexperts single for β SINGLE l c s h j : nits in (5.9) S h 69 ×

5 Chapter 5 Learning with Expert Processes sists of sufficiently similar decision candidates for which the agent already received rewards. Such neighborhoods vary in size with respect to decision candidates and kernels. To not over- estimate the resulting probabilities, we additionally quantify the density of a neighborhood, which we now define in detail.

Definition 5.8 (Relative density, absolute density). The relative density of the neighborhood is defined as:

d h c h ∑csh∈Nκ,δ (d sh) κ( si , s j ) θ κ,δ (dsh) = j i (5.10) REL i κ,δ d h |N ( si )| The absolute density is defined as:

κ,θ d h κ,δ d h N ( si ) κ,δ d h θ ( si ) = max{ ,1} θREL( si ) (5.11) Mκ N κ,θ d h where threshold Mκ ∈ + defines the expected number of decision candidates in N ( si ), κ,θ d h i.e. |N ( si )| ≥ Mκ . The relative density weights a neighborhood in terms of kernel values and normalizes its estimate by the size of the neighborhood. It, however, overestimates small neighborhoods with highly similar decisions. As a consequence, we need to define a minimal threshold Mκ for the neighborhood size such that small neighborhoods are penalized. We now present meta dependencies for single- as well as pairwise experts and will assume that received rewards are in interval [0,1].

5.3.1 Single Experts Single expert meta dependencies for (eh, dsh) quantify the behavior of eh for dsh based on similar decision candidates csh. Given neighborhood parameters κ,δ,M, we define l h d h l l MDSINGLE(ei , s ) with behavior βSINGLE and condition cSINGLE as:

l d h h l d h h d h c h ∑csh∈Nκ,δ (d sh) βSINGLE( s ,ei )ψSINGLE( s ,ei )κ( s , s j ) MDκ,l (eh, dsh) = j (5.12) SINGLE i l d h h d h c h ∑c h κ,δ d h ψ ( s ,e )κ( s , s ) s j ∈N ( s ) SINGLE i j

We define one instantiation of single expert meta dependencies, which is summarized in Table 5.1. The precision of a single expert (l = 1) is the averaged received reward over all decision candidates available in the neighborhood. The respective condition states that only decision candidates are used for calculation for which the expert was queried. Due to the available expert budget, we need to further constrain the neighborhoods based on the number of existing executions of an expert. We therefore define the condition-specific absolute density of a single expert meta dependency as

70 ewe w xet ftesm tpgvnadcso addt,epotn t neighborhood. condition its and exploiting candidate, decision a parameters given neighborhood step Given same the of experts two between MD dependencies meta intra-step Pairwise Experts Intra-Step Pairwise 5.3.2 arieitrse eadpnece for of dependencies pact meta inter-step Pairwise Experts Inter-Step Pairwise 5.3.3 eadpnec as: dependency meta dependen- Meta prediction. action their on both agree assuming and candidate, latter for decision the cies same for the queried on been experts have two experts of the rewards define received the then averages We the wrong. depicts being dependency expert meta first The ( experts 5.2. Table in summarized xeteeuin.W en the define assessed. We is comparison executions. in expert one the or expert candidate, current decision a the for of queried) reward been the have either both where (and disagree experts respective that case the edfiefu ntnitoso arieitase xetmt eednis hc are which dependencies, meta expert intra-step pairwise of instantiations four define We iia otesnl xetcs,w edt osri h egbrod ae nmissing on based neighborhoods the constrain to need we case, expert single the to Similar MD INTER l INTRA κ , oiatperformance dominant l e l ( i h = e i h ntesbeun tp ie egbrodparameters neighborhood Given step. subsequent the on ( , ) hc esrstedfeec nrcie eadcniindo tlatone least at on conditioned reward received in difference the measures which 2), e 1 l e c i h h j PAIR l , + e 1 h j , , Precision Name θ d d l s as: κ θ s h , h l δ ) κ Snl-tpepr eadependencies. meta expert Single-step 5.1: Table = ) ihbehavior with , ( , δ e ( i h d , ∑ s d h s c = ) h s h j = ) ∈ ( Behavior β N l κ θ SINGLE κ 1 = odto-pcfi bouedensity absolute condition-specific , θ , δ κ ∑ δ ( , κ )a elas well as 4) δ d , c , s ( δ s M h h d j β ) for ( ∈ s ( d β edefine we , PAIR l N h d s PAIR l ) on performance joint κ s h β ( , , ∑ ) δ e l e ( ∑ i h c d n condition and i , ( s = ) s c h e d h s ( ∈ ) h h s j e N ∈ ψ h , i h diieperformance additive d , N κ R , PAIR l , e s δ κ e ( h i h , MD ( h δ | d j d ) , N + ( s s e | d with h ( , 1 N s κ h ) j d e h , ) INTRA , ψ ) s κ δ i d ψ ( h ψ , s ( SINGLE δ l d , PAIR h d l PAIR i s ( l e c ) s )) d 6= i h PAIR l h eadpnec ( dependency meta s , ( ) with h e e ( | j ( ) h d i h j d | uniyterltv behavior relative the quantify ( ) , s as: s d κ e h 1 Condition h s , h j ( , h needn error independent {∃ e , d i e , d i h s . eaDependencies Meta 5.3 i h e e , 6= s fa nr-tpexperts intra-step an of h i , i h e h ( , e d ) ( ) h c j s h l j ) s ) j ihbehavior with ) = } κ κ h j ) ( uniyteim- the quantify , )bt elwith deal both 5) d δ ψ s , h M l , l c edefine we , s = h j ) )which 3) ftwo of (5.13) (5.15) (5.14) β PAIR l 71

5 Chapter 5 Learning with Expert Processes

l Name Behavior β l Condition ψl 2 d 2 d β PAIR( s,ei,e j) = ψPAIR( s,ei,e j) = 2 Error d d d d 1 d d d d |R( s,ei( s)) − R( s,e j( s))| {R( s,ei( s))+R( s,e j( s))≤1} 3 d β PAIR( s,ei,e j) = 3 d 3 Joint d d d d ψ ( s,e ,e ) = 1 d d R( s,ei( s)) + R( s,e j( s)) PAIR i j {ei( s)=e j( s)} 2 4 d d d 4 d 1 d d 4 Dominance βPAIR( s,ei,e j) = R( s,ei( s)) ψPAIR( s,ei,e j) = {ei( s)6=e j( s)} 5 d d d 5 d 1 d d 5 Additive βPAIR( s,ei,e j) = R( s,e j( s)) ψPAIR( s,ei,e j) = {ei( s)6=e j( s)}

Table 5.2: Intra-step expert meta dependencies.

l Name Behavior β l Condition ψl β 6 (dsh,eh,eh+1) = PAIR i j 6 d h h h+1 6 Future ψ ( s ,e ,e ) = 1 h d h d h+1 d h h+1 d h+1 PAIR i j {ei ( s )= s } R( s ,e j ( s )) 7 (dsh,eh,eh−1) = β PAIR i j 7 d h h h−1 7 Past ψ ( s ,e ,e ) = 1 h−1 d h−1 d h d h h d h PAIR i j {e j ( s )= s } R( s ,ei ( s ))

Table 5.3: Inter-step expert meta dependencies.

l d h h h l d h h d h c h ∑csh∈Nκ,δ (d sh) βPAIR( s ,ei ,e j )ψPAIR( s ,ei ,e j)κ( s , s j ) MDκ,l (eh,e , dsh) = j (5.16) INTER i j l d h h d h c h ∑c h κ,δ d h ψ ( s ,e ,e j)κ( s , s ) s j ∈N ( s ) PAIR i j

Note that we removed the horizon assignments from the expert in comparison, as both h+1 and h − 1 would be possible. We define two instantiations of pairwise inter-step expert meta dependencies which are summarized in Table 5.3. The future impact meta dependency (l = 6) measures how experts of the subsequent horizon are rewarded if the decision candidate of the current expert is chosen. Similarly, the past impact meta dependency (l = 7) quantifies how well the current expert is rewarded if the current decision candidate was suggested by the expert in comparison. Finally, we again account for missing expert executions by defining the condition-specific absolute density of an inter-step experts meta dependency as:

l d h h ∑csh∈Nκ,δ (d sh) ψ ( s ,ei ,e j) θ κ,δ (eh, dsh) = θ κ,δ (dsh) PAIR (5.17) l i |Nκ,δ (dsh)| Given meta dependencies for classes (i) - (iii), we define the complete feature set for an ex- pert and a decision candidate consisting of all meta dependencies for all experts and pairwise

72 candidate expert For combinations. expert nepr a ob oprdt tes h ienee o okp a engetda it as neglected be can lookups for needed time The complexity. others. the to influences compared linearly merely be to has expert an for dependencies meta pairwise and O conditions utcluain o ahepr fase n o ahkre.Smlry opeiisfor complexities Similarly, kernel. each O for by and bound be step can a dependencies of meta expert pairwise each O meta for is expert candidate calculations single decision duct for and neigh- expert complexity quasilinear single computational and a the for lookups result, dependencies linear a Lo- achieve As coined to times. techniques used construction – be borhood compared can i.e. are (LSH) vectors – Hashing where complexity similarity Sensitive – quasilinear Euclidean cality similarity yield Jaccard the or to If Cosine- [13] the trees For k-d kernel. exploit respective example, the for O of might, methods metric one approximate used, similarity of complexity is in the computational results on specific dependent potentially the end, is NN this fixed-radius To exact a candidates. O sion is Still, approximate i.e. kernel where – a – studied. performance for NN sufficiently quadratic neighborhood fixed-radius been the specifically have Building more – methods type pairwise). problem its inter (iii) (NN) and or Neighbors kernels pairwise Nearest available intra of number single, the (ii) (i.e. candidate, decision the of neighborhood state-action 5.1. learning Analysis support Complexity they step, subsequent the of EPs. account experts in into values take on dependencies correla- based meta use expert rewards dependencies inter-step meta pairwise future As expert experts. intra-step two pairwise between while tions success, expert’s an of hood ( ( h enldpnetsto eadependency meta of set kernel-dependent The olanepr egtfunctions weight expert learn To | | K S | || log E h | + S d | 1 ψ s ) || h o egbrodcntuto n ulna opeiyfrlous(O lookups for complexity sublinear and construction neighborhood for l S ntnitdfralrsda experts. residual all for instantiated , scntuda no vralkres i.e.: kernels, all over union as construed is | log MD | S | ) κ ( o O (or e i h , d s acltn eadpnec sdpneto i eiigthe deriving (i) on dependent is dependency meta a Calculating MD ( h | = ) K || ( ( | E e ∪ ∪ ∪ e h S h MD h h | ∀ ∀ ∀ , 2 − h e fmt eedniso expert of dependencies meta of set the , w e e e d ) h h h κ j j j 1 s e + + ∈ || h h eas fetniepiws oprsn fdeci- of comparisons pairwise extensive of because – 1 1 SINGLE κ E igeepr eadpnece esr h likeli- the measure dependencies meta expert single , n h epciebhvoa measures behavioral respective the and ∈ ∈ = ) S , h 1 E E | \ log e h h + + i h ∀ ∪ 1 1 l | MD MD ∈{ ∀ ( S κ e | 2 ∈ i h ) ( , , o h ne-tpcs,a h eairof behavior the as case, inter-step the for ) K 3 | INTER κ INTER κ d K , MD 4 , , s 7 6 MD , || h 5 } ) E , MD κ h ( ( κ , || e e l ( ( i h i h S e e , , INTRA κ | h h e e log , , , l h h j j ( d d − + | s s K 1 1 | h h S , , ) ( ) || d d | e ) S s s hncnit falsingle- all of consists then i h h h | o h nr-tpcs and case intra-step the for , log ) ) e i , h . eaDependencies Meta 5.3 j , | d S s | h ) ) sw aet con- to have we as , e h β o decision for l swl as well as ( log (5.18) (5.19) | S | 73 ) ).

5 Chapter 5 Learning with Expert Processes

We now present two Online model-free RL approaches to learn expert weights with meta dependencies as feature representation.

5.4 Online Reinforcement Learning

In the OnlineRL setting (see Definition 3.41), one aims to continuously improve predictions with every observed reward. We take a model-freeRL approach, thus not learning R and Tr (where Tr is known in our special case) but directly approximating expert weights. Note that generalization across high-dimensional states inRL is a hard problem [77] – numerous approaches deal with function approximation, where linear function approximation suffices [86, 124, 139]). ForEPs, where feedback is usually inferred from relatively small training sets, the problem becomes even more difficult. To this end, we use meta dependencies as direct estimators for expert performances. Over multiple episodes, we keep and update meta-weights for meta dependencies to gradually get more accurate predictions. We first apply the EWH algorithm toEPs and meta dependencies. Based on the latter, different measures related to budgeted expert advice, contextual bandits and MDPs are integrated to account for imperfect information due to the expert budget as well as the influence of the current decision on future experts.

5.4.1 EWH with Meta Dependencies Based on the prior defined EWH update rule (see Definition 3.31), the mapping toEPs is straightforward in that we need to extend it with decision candidates. For anEP, the essential EWH update rule for weh (sz) can be defined as:

SCALED d h h d h  d h R ( s ,e (( s ))  ∑ s ∈sz EWH weh (sz+1) = weh (sz) exp η (5.20) d h 1 h h ∑ s ∈sz {q(e ,s )=1} where q : E ×S → {0,1} denotes if expert e was queried for s, ηEWH is the EWH influence parameter of the update and RSCALED returns the rewards rescaled to [−1,1]. The update rule iterates through all decision candidates of a state and keeps a global weight for each expert, as it does not incorporate any available contextual information. To be able to exploit contextual information, we apply EWH directly to meta dependencies and refer to them as meta-experts. The set of meta dependencies ME is defined as follows:

i h d h h d h d h i d h κ,δ d h ∀MD (ei , s ) ∈ MD(e , s )∃mei : mei( s ) = MD ( s ) θl ( s ) (5.21) A meta-expert thus predicts the value of meta dependency MDi(dsh) 2 for candidate dsh weighted by θ κ,δ (dsh).

2 i d h κ,l h h d h κ,l h h+1 d h κ,l h d h For simplicity MD ( s ) refers to either MD (e j ,ek, s ), MD (e j ,ek , s ) or MD (e , s )

74 omlzn h eadb h rbblt falqeideprsi neioeadhorizon, and episode an in as: experts defined queried is all function weight of importance probability the the where by reward the normalizing weighting importance query to allows EP an As Information Incomplete with EWH 5.4.2 to bandits query. contextual not adversarial did and we advice pairs state-expert expert for budgeted account from techniques reuse thus We 5.26). Definition (see otherwise it weakens or function low weight relatively was importance probability resulting The as pair. used state-expert then respective is the for estimate an get follows: as defined be meta-experts for function reward inferred the let iia oeprs ahmt-xetgt sindawih function weight a assigned gets meta-expert each experts, to Similar n paerl o eawihsis: meta-weights for rule update ing eiincniae fasnl tt na psd n horizon. a and all episode for an constant defined in stays is state meta-weight rule single A update a candidates. weight of decision candidates meta-expert single decision the not that and Note states complete illustration. over ease to define we which etu trt hog l osbestates possible all through iterate thus We nepr egti hni endas: defined is then is weight expert An de o elwt nopeeifrainrsligfo budget from resulting information incomplete with deal not does EWH However, let end, this To h rdcino h eaepr hswihstercie eado h sindexpert. assigned the of reward received the weights thus expert meta the of prediction The where R AVG ( regularizer s z , χ me : R = ) ME hc a rpsdfrbdee xetavc.Ti sahee by achieved is This advice. expert budgeted for proposed was which , meta w me w iw C ( ∑ e → ∑ ( d h EXP d o vrg ead,a tete enocsteudt ftequery the if update the reinforces either it as rewards, average for h s d ( s ( s z s ∈ d h z ∈ + E s s , , s z z 1 h me h 1 ifrn tt-xetpis eetn h egtudtswith updates weight the extend we pairs, state-expert different R = ) eteasgmn ucinfo eaeprst xet,and experts, to meta-experts from function assignment the be = ) = ) { meta q ( = ) χ ( ( me w s d ∑ ∑ ∈ s ) me , h me me s S ˆ , z z h me )= ( e s 1 ( ∑ 1 ∈ ) z } d { E ) s χ h steaeaerwr o l eiincandidates, decision all for reward average the is h exp ( 1 ) me S ˆ { R ∑ z h q )=  ( SCALED me n s h xetpoaiiydsrbto to distribution probability expert the use and e e , R s h )= } w AVG me me 1 } P ( ( ( ( s s EXPERT d d z z . nieRifreetLearning Reinforcement Online 5.4 s s ) , h h me , ) χ w ) ( me ( me η s EWH , ( e )( s ) z ) d w s  h me )) : S → R h result- The . (5.24) (5.25) (5.22) (5.23) C EXP h 75 .

5 Chapter 5 Learning with Expert Processes

RAVG(Sˆh,me)  w (s ) = w (s ) exp ηEWH (5.26) me z+1 me z iw(z,h)

To this end, it is essential to deal with the exploration-exploitation problem with respect to non-queried experts. The EXP family of adversarial contextual bandits (e.g. [6]) are based on EWH, but extend the approach to update expert weights for non-taken actions. While in contextual bandits all experts are queried to suggest an action, inEPs all meta-experts are queried to suggest a weight for their assigned expert. We therefore directly reuse methods from the EXP4.P [15] algorithm, an adversarial bandit approach with high expected reward. Here, the probability of choosing an action is equivalent with an expert weight inEPs, which h 1 we adapt with respect to the minimal probability to choose any expert, namely pmin = [0, Nh ].

1 d h d h h h ∑me {χ(me)=eh} me( s ) wme(sz) h weh ( s ) = (1 − N pmin) + pmin (5.27) ∑me wme(sz)

To update the weights of meta-experts, we build on the expert update of EXP4.P, which h uses minimal probability pmin to align the weight update and exploits prior defined importance weight function iw with respect to confidence bounds to express the variance of the reward. It can be directly used forEPs:

 ph  RAVG(Sˆh,me) 1  w (s ) = w (s ) exp min 1 + ηEWH (5.28) me z+1 me z 2 {χ(me)=e} iw(z,h) iw(z,h)

We finally align meta-expert weights with respect to the actions taken in theEP. We thus reinforce the meta reward if the meta-expert was correct but its expert’ action was not chosen or if the meta-expert was wrong but its expert’s action was chosen, i.e.:

wrong(me,sz) =1{q(χ(me),sz)=1}  1 ∗ meta {(e(sz))=a )∧(R (sz,e(s))>0))} (5.29)  ∗ meta + 1{(e(sz))6=a )∧(R (sz,e(s))<0)}

The resulting boosting factor then consists of hyperparameter β ∈ R+ which is either active wrong(me,s ) or inactive, i.e. boost(me,sz) = β z . The final meta-expert weight update rule is:

76 hl Z Z While hyperparameter size batch with ( of available). amount are 5.9 certain instances Definition task a new for fast deferred how be on can dependent is decisions (which where time problems for realistic is assumption eoecniun to continuing before resulting The diction. decision between relations modeling is By it 3.15). as Definition rewards, 3.11). (see and align weights also expert Definition we states, (see of candidates, approach number learning a for batch process which PSL learning a functions, the exploit defer kernel to We of weights candidates. availability the decision expert among to for the relations) due to as (i.e. useful dependencies well links especially meta create as to is for enable dependencies SRL – meta PSL experts. of for i.e. Batch – 3.22 ) nature approach for Definition relational SRL episodes (see scalable of model a number collective fixed use a a to learn after is updates idea conduct central now The we RL. RL, Online in than Other Learning function Reinforcement for Batch SRL use 5.5 we where present EPs, now in We weights rewards. expert of learning use for efficient approximation. approach more RL enable Batch might a which updates, frequent less on h opeiyo acltn l eadpnece ..O i.e. – dependencies meta all calculating | of complexity the O count complexity computational with gorithm 5.2. Analysis Complexity expert’s an how depen- account meta into results. via take future experts rewards influences inter-step suggestion future pairwise incorporate for We meta-experts not where optimization). are dencies, explicit rewards for future 5.4 i.e. Equation techniques, (see RL Search Policy or Value-based w E S me , h adpo-ae pre- pool-based and RL Batch enable we where EP, an of version adapted an define We conduct to entails [80] RL Batch feedback, received the uses directly RL Online While ohmt-xetwih paerl n xetpoaiiyd o pcfial incorporate specifically not do probability expert and rule update weight meta-expert Both E || ( E , s A z h + , − BATCH 1 R 1 = ) | , )( T , | K H w orsod otenme feioe h gn eesudtn xetweights, expert updating defers agent the episodes of number the to corresponds || me Fnt-oio ac xetProcess) Expert Batch (Finite-horizon , C S ( EXP | s log z ) , h | exp Z S EP Batch + BATCH | ))  .Wt epc obthudts n waits one updates, batch to respect With 1. . predictions p al- EWH the on based is approach RL Online proposed the As min h 2 , Z POOL boost sdfie oda with deal to defined is BATCH ) ( 5.1) Definition (see EP finite-horizon a extends which o xetwihsi a in weights expert for me ∈ , ( s N | z E ) +  h | 1 ) n olsz yeprmtrZ hyperparameter size pool and n nyhst diinlytk noac- into take additionally to has only one , { χ ( me . )= i a is EP Batch finite-horizon A Z e } POOL . ac enocmn Learning Reinforcement Batch 5.5 R AVG iw ape ihnasnl step single a within samples pool-based ( ( S z ˆ , h h , ) me (( Z ) BATCH | + E explicitly h iw rdcinsetting prediction | 2 ( + 1 z psds The episodes. , h | E POOL ) h η optimized || EWH E 9 ∈ h (5.30) -tuple + N 1  | 77 + + h .

5 Chapter 5 Learning with Expert Processes

ZPOOL defines the number of episodes the agent is allowed to defer acting. Given that the agent went through the process for episode z − 1, it can observe states sz,...,sz+ZPOOL before acting

(see Definition 3.15). Similarly, the agent collects trajectories for episodes sz,...,sz+ZPOOL before updating (Definition 3.42).

To use all available information for the BatchEP, we train two models for each step – an a priori model before querying any expert and an a posteriori model after having queried all state-expert pairs 3. To this end, PSL enables directly integrating kernels into the model, thereby transferring inferred weights to other decision candidates by collective inference. The section proceeds as follows: we first describe PSL, a template language for generating Hinge-Loss Markov Random Fields (HLMRFs). We, then, elaborate on how to model weights weh to approximate Q.

5.5.1 Probabilistic Soft Logic and Hinge-Loss Markov Random Fields A HLMRF[8] is a conditional probabilistic model over continuous random variables. HLM- RFs are defined as probability densities:

1 M P(Y|X) ∝ norm exp[− ∑ λ jφ j(Y,X)] Z j=1

norm p with Z the respective normalization constant, weights λ j, φ j(Y,X) = [max{l j(Y,X),0}] j the hinge-loss potential functions, linear function l j and p j ∈ {1,2}. PSL[24, 67] is a modeling language for HLMRFs. A model in PSL comprises weighted, first-order logic rules (templates for φ j) with features defining a Markov network. Here, PSL relaxes conjunctions, disjunctions and negations as:

A ∧ B = max{A + B − 1,0}

A ∨ B = min{A + B,1} ¬A = 1 − A with A → [0,1] and B → [0,1]. The use of hinge-loss feature functions makes inference tractable, as it is a convex op- timization problem and thus potentially scales to large datasets. We now present our PSL model based on meta dependencies forEPs.

3Note that we gradually increase our training set with each batch, where densities provide a natural way to passively forget older samples. An alternative would be to actively forget all old batches but reuse old weights as priors.

78 θ MD densities respective with MD dependencies meta pre-computed MD are experts inter-step pairwise W Var : Relation e.g. normalized and 5.4 episode. batch Equation Labels a on for step. based gathered same constructed are the samples manually of are experts different weights experts across expert all generalize for for to function enable weight dependencies expert meta one that learn approach, online our in weights their learn and weights expert learn We PSL with Dependencies Meta 5.5.2 raigepr egt fteei agreement. is there if weights expert creasing we Here, yet. queried been not have experts where setting we priori We Agree a predictions, relation them. the define with expert rules for current the rules the extend model and for first, relations account disagreement To and agreement model high. additionally sufficiently are dependencies meta as rules, same case. expert the single of the consist in information models more posteriori us a give not and does priori expert A respective the modeled. querying explicitly be to relation has weight expert on influence positive as candidate decision suggests expert an that denotes relation, transition ortr rbblt fareeto iareetbsdo h oee conditions modeled the on based disagreement or agreement of probability a return to l κ ( All fe aigqueried having After respective if increases expert an of weight the that express rules expert intra-step Pairwise latter the use directly we where dependencies, meta expert single for rules model first We , δ e INTER h ( d , d n s h s ayrltosw en a hi aibe oara ubrbtenzr n one, and zero between number real a to variables their map define we relations -ary h ) − sdfie o ige swl spiws eadpnece.Tr dependencies. meta pairwise as well as single- for defined as , ) eadpnec eain o arieitase xet MD experts intra-step pairwise for relations dependency Meta . D κ ¬ , l MD MD ( ¬ e MD MD i h i , SINGLE κ SINGLE κ × e , , h l l j ( INTRA κ INTRA κ Var + e , , 1 λ i h l l , C , j w d e j mdlprse n,ohrthan other and, step per model PSL single a construct therefore We . EXP ( ( h s h j → e e e h ( ( , h h h ) e e d , , t osrc ifrn oeta functions potential different construct to PSL using by i h i h s [ n MD and d d tt-xetpis ecnepotnwifrainb nyin- only by information new exploit can we pairs, state-expert 0 h , , s s e e ) , h h l h h 1 j j INTER κ ) ) hc seautdfrtecmlt egbrodi order in neighborhood complete the for evaluated is which , , , ] , ∧ ∧ d d l for s s MD MD h h ) ) SINGLE ( n ∧ ∧ ∧ ∧ e SINGLE SINGLE i h = Agree MD Agree MD , e .A xetweight expert An 2. h j − + INTRA INTRA 1 W D ( ( − − , e e d e κ D D h i h i h s , l h eaino h uehlsa eland well as holds rule the of negation The . , , h − − ( κ κ e e ) e , , h h l l j j D D h ( ( , , n igeeprsMD experts single and , e e d d κ κ d h h s s , , s l l , , h h ( ( h d d ) ) e e . ac enocmn Learning Reinforcement Batch 5.5 ) l l s s este r acltdbsdon based calculated are Densities . i h i h h h , , = = = ) = ) e e ¬ ⇒ ⇒ h h j j , , d d ¬ ⇒ ⇒ w s s W e h h h W ) ) W ( srpeetda relation as represented is e W ( INTRA c i h ( e s , e i h h ( E d h , + e s , h d h h 1 d soritiinis intuition our as , ( s , ) − s e h for INTRA κ d h ) h , s D ) l , h SINGLE κ d ) κ d , s , l s l h h ( ( , . e e c i h i h s , , h φ ( after e e + e j ψ h h j j 1 ( h , , X ) , l d d the , . d s s , s Y h h all 79 h ) ) ) ) , ,

5 Chapter 5 Learning with Expert Processes

κ,l h h d h κ,l h h d h MDINTRA(ei ,e j , s ) ∧ MDINTRA−D (ei ,e j , s ) h d h c h+1 h d h c h+1 h d h ∧ Tr(ei , s , s ) ∧ Tr(e j , s , s ) =⇒ W(ei , s ) κ,l h h d h κ,l h h d h ¬MDINTRA(ei ,e j , s ) ∧ MDINTRA−D (ei ,e j , s ) h d h c h+1 h d h c h+1 h d h ∧ Tr(ei , s , s ) ∧ Tr(e j , s , s ) =⇒ ¬W(ei , s ) Rules for pairwise inter-step expert meta dependencies express that current expert weights have to be high if respective experts are either estimated to perform well or if they generated good states. Here, we use the same rules for both the a priori and the a posteriori setting.

κ,l h h+1 d h κ,l h h+1 d h MDINTER(ei ,e j , s ) ∧ MDINTER−D (ei ,e j , s ) h d h =⇒ W(ei , s ) κ,l h h+1 d h κ,l h h+1 d h ¬MDINTER(ei ,e j , s ) ∧ MDINTER−D (ei ,e j , s ) h d h =⇒ ¬W(ei , s ) d h c h We propagate inferred weights by relation Similarκ ( si , s j ), thus using kernels to exploit similarities among decision candidates. It becomes clear that pool-based predictions might potentially increase the performance of the approach by clustering similar decision candidates through kernels.

d h c h h d h h c h Similarκ ( si , s j ) ∧ W(e , si ) =⇒ W(e , s j ) d h c h h d h h c h Similarκ ( si , s j ) ∧ ¬W(e , si ) =⇒ ¬W(e , s j ) We finally align experts with similar outputs by driving their weights to be the similar. For the a priori setting, we define rules based on the agreement probability.

h d h h h d h h d h W(ei , s ) ∧ Agree(ei ,e j , s ) =⇒ W(e j , s ) h d h h h d h h d h ¬W(ei , s ) ∧ Agree(ei ,e j , s ) =⇒ ¬W(e j , s ) Based on the observed expert action candidates, rules for the a posteriori case are modeled as:

h d h h d h c h+1 h d h c h+1 W(ei , s ) ∧ Tr(ei , s , s ) ∧ Tr(e j , s , s ) h d h =⇒ W(e j , s ) h d h h d h c h+1 h d h c h+1 ¬W(ei , s ) ∧ Tr(ei , s , s ) ∧ Tr(e j , s , s ) h d h =⇒ ¬W(e j , s )

For learning weights of the PSL model and inferring weh , we use an approximate maximum likelihood weight learning algorithm and most probable explanation (MPE) for probabilistic inference [7]. More specifically, we reuse a voted perceptron, where an iteration of weight

80 r utbet aiietenme of number the maximize to with suitable MDPs are specializing EP s by budgets. advice execution expert as multi-step well for as s EP experts formalized available we chapter, this In Summary 5.7 basic their spaces), state reused. the directly in be uncertainty can on functionality based pairs state-expert possible of ap- amount efficient large however, are, There problem. Neighbor most. efficient, Nearest are corresponding the updates the costing EWH for is proximations The dependencies scarce. meta are calculating resources computational where when favorable still is it approaches behavior. our expert optimization, pairwise enable learn to to weights how dependencies expert showed meta future of we exploit (ii) While terms additionally in states. constraints, for functions budget behavior value expert from redefine unknown to resulting (iii) RL and of information outputs use partial expert the of (i) enables impacts and with advice, deal expert (budgeted-) to and techniques MDPs on builds framework EP The Discussion learner.5.6 chosen the on dependent is data training on based weights inferred the adapting and rsligi rudnso l ariedcso addtso h ac)adedu with up end and kernels batch) for the of functions) RL candidates potential Online decision complete pairwise (i.e. the all O rules availabil- exceeds of the PSL de- thus groundings and incorporate grounding in it variables (resulting of additionally as involved complexity we quantifiable, relations, The as absolutely of episodes. (ii) approach, variables. number not batch rules), latent relations, is the PSL of of of (ii)) ity arity length (point (i.e. the the functions functions on (iii) potential potential pends and of available functions complexity of potential ground- number The respec- individual for the steps, of needed individual (i) complexity time general for on the The needed dependent time is learning. the and ing with inference grounding, dealing comprising by tively approached be can HLMRF 5.3. Analysis than Complexity (other problem optimization Networks convex MLNs). a Logic in to Markov available cast in as soft be example, problems can that for combinatorial inference fact used, that the and as training exploits used values MPE, using are truth training), (MLNs)) without binary model for targets than used PSL set (other also the values is prior update truth (which with to method agreement inference the then the The case and maximizing our data. values, of (in target data terms generate training in available to weights the calculated) on we condition weights and expert inference run to entails learning (( epcal oda iha with deal to (especially EP s for used be to adapted be to have algorithms RL Although predictions, pool-based and learning batch exploit cannot approach RL Online our While | E h | 2 + | E h || E h + 1 | + | E h uniyn h opeiyo erigwihso ninstantiated an of weights learning of complexity the Quantifying || E h − 1 | )( | K || S | log | S || Z BATCH | 2 )) h ienee o inference for needed time The . . Discussion 5.6 81

5 Chapter 5 Learning with Expert Processes correct task solutions, as they define essential properties of expert weight functions. The latter express the expected performance of an expert for an individual decision candidate, but also take into account the impact on experts of future steps. Our work onEPs addressed Research Question1, where the corresponding Hypothesis1 can be partially confirmed and will be revisited after empirically evaluatingEPs in Chapter9. For learning fine-grained expert weights for decision candidates of states, we introduced meta dependencies as features which calculate performance-related relations among single- or pairs of experts. We showed that meta dependencies can be best exploited by OnlineRL based on the EWH algorithm and BatchRL with PSL as function approximator, both enabling to learn expert weights forEPs. We therefore addressed Research Question2 with ourRL algorithms forEPs and, similar to Hypothesis1, can partially confirm Hypothesis2, but have to revisit the latter after the empirical evaluation in Chapter9.

82 ocodnt isdudtsad(i)sedn plann yfis riigmlil agents multiple training [94]. first policies learned by their learning coordinate up them have speeding eventually (iii) and agents and independently among benefits, updates shared mutual be missed provides can coordinate experts MAS where to a architectures, stakeholders in centralized multiple cooperation to redundancy with but adding available settings (ii) publicly problem not real-world are novel, experts model where to in enabling result (i) might namely agents multiple to to advice expert order multi-step in within MAS improvements. distribut- experts similar a consequence, of in a control As agents the performance. separate predictive ing train improve to or different time enables conceptually convergence frames, functions into minimize observable tasks reward on such different decomposing based with that games show steps play authors to The learning agents. as multiple sufficiently to such are tasks, errors single-agent their complex if ing experts two individual of characteristics. each similar prediction for with the points correlation data combine expert for investigate only independent to to sensible and thus point is data It manage. and disclose or detected, their easily in be occur correlate behavior might experts adaptive correlation when any Expert misperform learn errors. to not joint policies do learned and cause queried respect might merely With which are scenario. esti- themselves, experts learning global 5.1 ), the Definition a on (see learns depending EPs decision-maker policy, to or single value-function a model, that the entails of RL mate in perception SAS prevalent The Introduction 6.1 To reference Our developed. 7. approaches be Chapter coordination in to Multiagent the MEPs three have of evaluate for empirically agents perception we Multiagent 3, cooperating Hypothesis a for confirm fully proposed approaches 3 coordination Hypothesis where problem. problem, correlation expert the with chapter This Processes Expert Multiagent with Learning 6 Chapter stphssvrladvantages, several has setup MAS a correlation, expert local with dealing to addition In in advances recent end, this To wihdeals which 3 Question Research to solution as framework MEP the proposes o hscatri [104]. is chapter this for locally ie nyfrdt ihseiccaatrsis hc shr to hard is which characteristics) specific with data for only (i.e. eaaino Concerns of Separation globally ie o lotaldt ons hc can which points) data all almost for (i.e. rmt distribut- promote 134 ] [82, (SoC) 83

6 Chapter 6 Learning with Multiagent Expert Processes

In this chapter, we investigate these findings for the problem of multi-step expert advice, where we frame the setup as cooperative MAS, more specifically as MEP, where each expert is weighted and controlled by an individual agent. The resulting problem then is to learn how to coordinate expert weights and actions to optimize the decentralized weighted majority voting among agents. The central intuition is that the coordination process might enable to reduce the available bias when experts strongly correlate. Figure 6.1 denotes the general schema of multi-step expert advice in a MAS. In the bottom layer, the conceptual steps of the respective multi-step task are shown, e.g. NER and NED for NERD. For all available steps, agents need to take joint decisions (i.e. choose joint actions) which can, for example, be the result of the weighted majority voting of all agents of the step. In the top layer, all available agents are shown, where an agent controls one or more experts and has to primarily coordinate its available actions (i.e. the action it receives via its experts) with residual agents of the same step. To account for future impacts of its decisions, an agent can also coordinate with agents of the subsequent step. The opposite communication direction is also possible, as agents might want to learn which agents of prior steps produce good or bad actions. For better illustration, consider the following example for conceptual steps NER and NED of NERD. Example 6.1 (The NERD task in a MAS). Agents for step NER need to sequentially decide for pieces of texts, such as Michael Jackson plays basketball., which token se- quences constitute named entities. Given black-box experts solving NER, agents coordinate prevalent actions, e.g. if Michael Jordan is a named entity or not. After a number of co- ordination rounds, the weighted majority estimate – i.e. the joint action – is passed to agents of step NED. To this end, NER agents also need to communicate with NED agents to find out who exploits their individually proposed actions. By distributing the control to multiple agents, we extendEPs to the MAS setting by spe- cializing MMDPs with the availability of experts and budgets. We then present an adequate feature representation based on meta dependencies for individual agents, which expresses var- ious relationships among experts and agents with respect to the coordination problem. We first approach passive coordination in MEPs (i.e. the independent learner setting, see Definition 3.46), where agents only know about their own experts’ predictions and the result of the weighted majority voting, but still have to align each other. Our passive coordination approach builds on the EWH algorithm to adapt agent weights with respect to the weighted majority vote of all agents. For agents to actively coordinate their experts, we define a decision protocol inspired by swarm intelligence. Here, agents receive information about all residual actions to reconstruct the global state (i.e. joint-action learner setting (see Definition 3.45)). Other than in one-step coordination settings as available in SoC or most hierarchicalRL approaches, we model a designated decision process for coordination. The latter – which we refer to as expert co- ordination process (ECP) – yields local finite-horizon MDPs with continuous action spaces

84 hscatrcnrbtsi h olwn ways: following the in contributes chapter This Contributions 6.1.2 initially as challenge, following the with deals chapter this 1.2: Section 1-4, in defined challenges to addition In Challenges 6.1.1 process. learned coordination each fixed-length the where a in policies, propose decision agent then single non-stationary We a robust basis. reflects learn rule model meta to a on approach on based RL experts heuristic their Search a in for develop Policy confidence weights first We its agent agents. adapts quantifies residual which which to dependencies function learned. respect weight be with relative actions to a experts’ have learns its policies agent an coordination ECP, non-stationary the where Within agent, participating each for ihapsie n nactive an and passive- a with MEPs define and MAS to formalization EP our extend We • one learning, ensemble or advice expert with prediction on based models learning When • eorlt xetwihs o ciecodnto,w en oe eiinprocess decision novel a define we coordination, active to For solutions weights. lightweight develop expert to decorrelate enables coordination Passive protocol. coordination wrong potentially overestimating ( in hypotheses results expert which correlation, expert from suffers often Shm fMliSe xetAvc naMAS. a in Advice Expert Multi-Step of Schema 6.1: Figure hleg 5 Challenge ). . Introduction 6.1 85

6 Chapter 6 Learning with Multiagent Expert Processes

(referred to as ECP), which extends one-step coordination in MAS to a multi-step de- cision process. The ECP enables to develop complex coordination approaches, as more information are available about other agents (Contribution 3.1).

• We extend meta dependencies to relational agent features, which are used as feature representation for active coordination (Contribution 3.2).

• We propose a lightweight coordination approach for MEPs with passive coordination, where agents cannot communicate their actions (Contribution 3.3).

• We develop two coordination approaches for MEPs with active coordination, where agents are allowed to communicate to learn better coordination strategies (Contribution 3.4).

• We empirically evaluate our approach for the multi-step NERD task with respect to maximizing the overall expected reward in Chapter7.

The chapter proceeds as follows: We first formalize our novel problem setting and introduce MEPs in Section 6.2. We then present our approach to passive coordination (Section 6.4, where respective agents receive limited information about their environment. In our active coordination approaches (Section 6.5 and 6.6), we assume full information for all agents, which enables more efficient coordination. We finally conclude with a summary of this chapter in Section 6.8.

6.2 Problem Formalization

We now define MEPs, where cooperative agents share the control over available experts and have to choose joint actions which maximize the number of correct task solutions. Fig- ure 6.2 illustrates a MEP, where the joint action is chosen based on an arbitrary function (e.g. weighted majority voting). Definition 6.1 (Multiagent Expert Processes). A finite-horizon MEP extends a finite- horizon MMDP (see Definition 3.44) with the availability of experts and is a 11-tuple COMM ({Sλ }λ∈Λ,Λ,{Eλ }λ∈Λ,{Aλ }λ∈Λ,{Aλ }λ∈Λ,R,Tr,T,H,CEXP,CCOMM). COMM • Let {Sλ }λ∈Λ be the set of states, Aλ∈Λ be the set of environment actions and Aλ∈Λ be the set of communication actions, as defined for MMDPs.

• Let CCOMM the communication budgets as defined for MMDPs. It confines the interac- h h h h+1 h+1 h tion among agents (λi ,λ j ), (λi ,λ j ) and (λi ,λ j ) through communication actions COMM 1 in Aλ∈Λ , where each agent gets to query respective agents CCOMM times . 1Without loss of generality, we assume unidirectional communication, where an agent has to actively use its budget.

86 ur xet,adcluaewihs h hleg o gnsi ofida find to is agents for challenge The weights. calculate and experts, query faeokwt narbitrary an with framework MDP the in Advice Expert Multi-Step of Schema 6.2: Figure ,weealaet aet hoeand choose to have agents all where EPs, to analogously defined is protocol MEP The The Frsteps For • LtT Let • LtE Let • LtT etetasto ucin h eadfnto,HtehrznadC and horizon the H function, reward the R function, transition the be Tr Let • Let • ttst cin,ie e i.e. actions, to states i.e. actions, to steps coordination and EPs. for defined as budget, state-expert protocol – – – – – Λ ehns ocos on actions. joint choose to mechanism λ ahaetcossa cinadst t nta weights, initial its sets and action an chooses agent Each h gn omte rdcsaction predicts R committee agent The A expert. weighted highest tions queries and chooses agent Each Each ∈ etesto gns hr nagent an where agents, of set the be ( etesto xet o agent for experts of set the be oriainprotocol coordination s N z h h , poed nepisodes in proceeds MEP a of ≥ a a λ = 0 h 1 h i ) ,..., etenme fcodnto steps. coordination of number the be eevsstate receives 1 ∀ ,..., a h a ∈ C h H EXP A : h : . . S → s z h EPs. for defined as A, fepisode of nsjitaction joint finds λ λ C i ∈ h EXP z : z Λ . λ T = hr nepr e expert an where , tt-xetpairs state-expert ∈ × 1 Λ a ,..., S t ∗ h a safnto vrdcso candidates decision over function a is → ∗ n ahagent each and Z A : h . rbe Formalization Problem 6.2 ( s ∈ z h p , E i e ( on action joint λ h 1 ) observes , safnto from function a is d n eree ac- retrieves and s ) ae nits on based , rewards (i.e. EXP a the 87 t ∗ )

6 Chapter 6 Learning with Multiagent Expert Processes for every step h. The MEP protocol was kept variable with respect to the used coordination protocol, as we distinguish among passive coordination and active coordination. Coordination steps t ∈ 1,...,T are only used for the active coordination setting, and will be kept static (i.e. t = 1 throughout the MEP) for passive coordination. Before introducing properties of value functions, we define the relationship between agents and their experts, as well as available weight functions.

Definition 6.2 (Agent-expert relationship, agent- and expert weights, agent prediction). An h h N+ h h ≥0 agent λi manages expert set ei1,...,eiM ∈ Ei with M ∈ . Let wi : Ei × S → R be the h expert weight function of λi to score its experts. Let now ei denote the expert with the highest estimated weight for a decision candidate:

h h d h h d h ei : wi(ei , s ) = max wi(e , s ) (6.1) h h e ∈Ei

h h ≥0 h The agent weight pi : T × S → R expresses the confidence in the action of expert ei , the h expert with the highest estimated weight. To this end, an agent predicts the action of expert ei h d h h d h for t = 1, i.e. λi (1, s ) = ei ( s ) h The MEP extensions for individual agent weights pi and value function Q with regards toEPs are straightforward. Individual agents are now in in control of their weights and predic- tions, and the same global properties hold as forEPs. An agent weight for a decision candidate dsh has to express its local reward and the maximal Q-value of the next step, reached via any action which includes choosing dsh:

Definition 6.3 (Agent weights in MEPs). An agent weight in a MEP is defined analogously to expert weights inEPs:

d h d h d h h+1 h h h h+1 h+1 pi(t, s ) = R( s ,λi(t, s )) + ∑ Tr(s |s ,a ) max Q (s ,a ) ah+1∈Ah+1 (6.2) ah∈Ah d sh

h h h h d h h with Ad sh = {a |a ∈ A : ∆eh ( s ) ∈ a } as defined forEPs (i.e. the set of all actions, where dsh is chosen 2).

The corresponding state-action value estimate for state sh and action ah is based on all involved agents, their weights for all decision candidates and their proposed actions.

Definition 6.4 (State-action value function in MEPs).

d h h d h h p (t, s )1 h h h h h ∑λi∈Λ ∑ s ∈s i {λi(t,s )=a } Q (s ,a ) = h . (6.3) h d h h ∑λi∈Λ ∑ s ∈s pi(t,ds )

2Note that this set might be incomplete if perfect communication is not possible

88 eemnd edfietejitato ob the be to action joint the define We determined. step 3.46). Definition (see setting C set we p weight agent 6.7 Definition coordination step passive respective the on dependent policy λ 6.5 Definition values. nbiga gn oaadniseprs rdcin oflo h cinwt ihs weight. highest with action the follow to predictions from experts’ ideas its abandon of to terms agent in an function enabling prediction agent the extend We [29]). (e.g. setting learner joint-action the follow predictions thus action We residual all episode. observes an agent each for i.e. communication, constrain not do hence p weight its change ECP. 6.8 the this in Definition to use illustrated corresponds is and process coordination actions decision Active novel their actions. the observe and where to 6.3, weights own Figure agents their other adapt with better communicate to knowledge can agents Here, cess. R 6.6 Definition EPs: for sdfie as: defined is , ∈ fe l gnscoeteratosadwihs h on cino l gnshst be to has agents all of action joint the weights, and actions their chose agents all After plce,which policies, MEP non-stationary of type novel a define now We ECP. the define then and MEP the for adapted are agents of predictions how explain first We pro- decision designated a in actions their negotiate to agents enables coordination Active agent of prediction the now, Until reward cumulative expected the maximize to is MEP a of goal overall The Λ = λ h π 1 i tcodnto tpti endas: defined is t step coordination at h i ,... COMM i : ( hoe h cinpooe yisepr,i.e. expert, its by proposed action the chooses 1 S , h T d → s erfrt hsstigas setting this to refer We ECP. the in act to require thus and ) Cmltv eadfrMEPs) for reward (Cumulative = MEP) a of action (Joint PsieCodnto nMEPs) in Coordination (Passive Atv oriaini MEPs) in Coordination (Active ae nterwr ftewihe aoiyvtn ftecmite Hence, committee. the of voting majority weighted the of reward the on based A 0 h ee ahaettist n a find to tries agent each Here, EP. single-agent a in n obdcmuiain h etn eest h needn learner independent the to refers setting The communication. forbid and i ( . t , asv coordination Passive d s ) swl siscoe action chosen its as well as R a t ∗ CUM : h Q nodrt maximize to order in λ h = ( i h s . E h ihsai oriainstep coordination static with , h on cina action joint The [ a h ∑ = H t ∗ 6.2. Figure in schema MEP general the to equal is = ) 1 . d ∑ D = . R MEP, a of reward overall The h . 1 max a o T For R ∈ aia action maximal o igecodnto tp ..T i.e. step, coordination single a For A h h ( d Q s λ h h i > ( λ , ( s d t i h ( , a 1 R 1 , d t ∗ h a , s CUM oriainses agent steps, coordination )] d ) ) falaet ftecretstep current the of agents all of s ee estC set we Here, = ) wr intelligence swarm erfrt hsstigas setting this to refer We . . rbe Formalization Problem 6.2 ihrsett state-action to respect with e i ( t depend d = s non-stationary ) n nyaat its adapts only and , ciecoordination active seult a to equal is 1 R COMM ncoordination on CUM CUM sdefined as , = : ,by [146], S | × Λ policy, λ | i local (6.5) (6.4) A = and can → 89 1 , .

6 Chapter 6 Learning with Multiagent Expert Processes

Figure 6.3: Schema of Multi-Step Expert Advice in the MDP framework with active coordi- nation protocol.

Within a coordination step, an agent either chooses the action proposed by its highest weighted ∗ expert or follows the committee average at , resulting in the following agent prediction func- tion at coordination step t.

Definition 6.9 (Agent prediction function for a coordination step). An agent λi predicts the action of its expert ei for t = 1 (as defined before) and adapts the weight based on the following update rule:

d h d h d ∗  λi(t + 1, s ) = max ei( s ), at ) (6.6) d h d h d ∗ h d h d ∗ where ei( s ) is scored by pi(t + 1, s ) and at by Q ( s , at ) Definition 6.10. An ECP extends a MMDP and is a 8-tuple ECP ECP λ ECP ECP (S ,Λ,A ,{E }λ∈Λ,R ,Tr ,H,T). ECP ECP ECP • Let S be the coordination step dependent state space. A state st ∈ S consists of state information of the respective MEP as well as current action predictions and weight estimates of all agents of the coordination step t, i.e.

ECP h h d h d h h d h h d h s = s ∪ h h d h h {(λ , a ,w)| a = λ (t, s ) ∧ w = p (t, s )} (6.7) t ∀λi ∈Λ ∀ s ∈s i i • Let Λ be the set of agents, as defined for MEPs. • Let AECP be the continuous action space.

λ • Let {E }λ∈Λ be the agent-dependent expert space, as defined for MEPs.

90 h nydfeec otedfiiino gn egt in weights agent of definition the to difference only The MEP. the of values (i.e. state-action MEP future corresponding the of function tpt agent t, step 6.11 Definition agent, an of ECP. the summarizes 6.4 Figure The functions weight MEP the on effect have which values, continuous are ECP the in Actions The Ec gn eevsreward receives agent Each • MEP the of action joint The • action MEP on decides agent Each • Tr Let • R Let • Ec gn ae oa action local takes agent Each • MEPs. for defined as step, coordination the be T Let MEP. • current the of step the H Let • R protocol ECP tt-cinvlefunction value state-action ( d s h ECP , p ECP λ a i t ∗ i ( ) t : MEP corresponding the of function weight its adapts also : n action and ) ECP) the in actions (Agent + T : S × 1 t ECP , S d rcesfrcodnto steps coordination for proceeds t s ECP h × ) . A × t ECP A a TeEPwti MEP. a within ECP The 6.4: Figure t t i ECP p , ECP → i ( t → + S r . a t ECP t ECP a + t ∗ 1 (i.e. ECP an of R t i , 1 , sdtrie swihe aoiyvote. majority weighted as determined is ECP d ≥ acltdbsdo h eadof reward the on based calculated , s h rniinfunction. transition the 0 h ) etecodnto tpdpnetrwr function. reward dependent step coordination the be λ wih function weight MEP its adapting by Q i ← ( . h t aeaddto added are ECP the of rewards local that in ) ytkn cina action taking By + p i 1 ( t , , s d h s ) h . Q t + ) t = sdfie ihrsett h value the to respect with defined is ) a 1 t i ,..., , ECP T t i , ECP : . rbe Formalization Problem 6.2 ∈ A ECP a t ∗ (i.e. MEP the in ncoordination in p i ( t + 1 , d (6.8) s h 91 ) .

6 Chapter 6 Learning with Multiagent Expert Processes

MEPs is the coordination step dependent reward estimate. It is dependent on the joint action ∗ d h at of the MEP, i.e. the adapted agent weight pi(t, s ) is evaluated with regards to the positive ∗ or negative impact agent λi has for joint action at . Definition 6.12 (State-action value function in ECPs).

ECP i,ECP ECP ECP i,ECP ECP h+1 h h h h+1 h+1 Qt (st ,at ) = R (t,st ,at ) + Tr (s |s ,a ) max Q (s ,a ) ∑ h+1 h+1 sh+1∈Sh+1 a ∈A (6.9)

Learning to act in a MEP with ECP entails learning a non-stationary agent-dependent policy ECP ECP πi : T ×S → A for the ECP. The policy is non-stationary since it depends on the current ∗ coordination step. As an agent is only rewarded if the correct final joint action aT has been chosen, the reward function is dependent on the coordination step as well. EXP EXP In this work, budget Ci is agent-dependent and set to Ci = 1 ∀λi ∈ Λ, i.e. an agent is allowed to query its expert exactly once 3. We start by introducing a novel approach to learning local expert weights for decentralized agents by exploiting the assumption that all experts have to be executed for the same decision candidate.

6.3 Expert Weight Learning for MEPs

EXP In MEPs with Ci = 1 and perfect communication, we can exploit fixed-length feature spaces, as each expert is queried for the same decision candidates. As a consequence, any online supervised learning technique is applicable (such asGD, see Definition 3.18) to learn weights for our feature representation. To this end, we use the set of all meta dependencies (see Section 5.3) as feature represen- tation and assume a linear relationship among them. We thus deal with linear regression (see Definition 3.17) and standard squared loss LSQUARED, usingGD to perform weight updates. For the MEP, we need to learn the agent-dependent model parameter vector φi for agent λi:

GD d d h T pi (1, s) = MD(ei, s ) φi (6.10) To this end, multiple options for using labels exist. A straightforward way is using rewards d d R( s,ei( s)) for a decision candidate and the expert of the agent, where future impacts of the action are only taken into account via inter-step meta dependencies. An alternative would be to use an estimate of the state-action value, which is approached by the well-known Q−Learning algorithm (extend with function approximation) [140]. d h d d With expert rewards as labels, we derive tuples (MD(ei, s ),R( s,ei( s))) to learn the model parameters, where deriving the gradients for LSQUARED is straightforward. The result- 3Otherwise, agents are enabled to manipulate states to learn about their expert, which opens different research questions.

92 udt uefor i.e.: rule reward, update as EWH action standard joint the the of use reward We the use expert. its by proposed action R (i.e. action steps coordination joint ignore taken therefore difficult. the We coordination on makes but reward. based scalability, received improves weights the their as adapt well to as agents requires coordination Passive MEPs with Coordination Passive 6.4 coordination, passive with start allowed. We not is values. using agents initial but among as communication case, where weights this expert learned with covered. having fully deal without yet already not is 6.8. 5 advice Section Chapter expert in multi-step in case for techniques proposed budgeted learning the methods supervised to standard SAS models our regression that for extension Note possible a discuss will MEP s. for 5 Chapter in proposed h eutn gn egti endas: defined is weight agent resulting The EWH. to similar updated is which agent, additional each an define we specifically, More in is agents needed sub- the the (querying and complexity not current in the is prior, increase latter the significant the O of If a agents yielding all account. with step, into communicate sequent taken to not has is agent experts an where residual optimized, communication, all perfect assume of we predictions that Note the retrieving 5.3). Section (see dependencies meta of ity GD for space O feature is agent as single used a of is update dependencies model a meta O for where defined complexity computational of resulting set the and complete the 5.4), tion 6.1. respect Analysis wit Complexity choice the and MEP. the once as exactly by parameters, sample controlled epoch each is or use pairs state-expert iteration we to since incorporate , SGD not each and do for GD thus candidates for We decision available SGD. of by combinations approached using possible as all by execute expert) 3.18 ) to Definition entail (see would (which SGD ples and GD between situated available every are updates weight expert ing ( nodrt lg urn gn weights agent current align to order in 3.31) Definition (see algorithm EWH the using by , ahagent each MEP , a in coordination passive For as instantiated coordination, passive for weights agent adapt to aim We enwpeetapoce to approaches present now We we candidates, decision same the for executed be to experts all presumes method this While | Λ | + ( | Λ | MD h | + | ( = ) | yntamn ortiv l osbesam- possible all retrieve to aiming not by GD, in done as individually sample Λ h − | 1 E | ) h | o igeaet.Tesm od o esn xetwih learners weight expert reusing for holds same The agent). single a for 2 + p i PASSIVE | E (e Sec- (see EWH on based approach RL Online our to Similar h || E h coordinate ( + 1 1 , | d + s z h | = ) E asv oriainwih w weight coordination passive h || p E oflcigepr yohss hc lowork also which hypotheses, expert conflicting i ( h 1 − , 1 d λ | )( s i z h a nypredict only can | ) K w || . asv oriainwt MEPs with Coordination Passive 6.4 i PASSIVE S | log | S ( | s ) z sdfie o h complex- the for defined as ) λ i ( d i PASSIVE s p w = ) i PASSIVE PASSIVE T e i ( = : d S s : u only but , ) which 1), ) → T ..the i.e. , ( | × MD (6.11) R S for 93 → | ) i . ,

6 Chapter 6 Learning with Multiagent Expert Processes

PASSIVE PASSIVE SCALED d EWH wi (sz + 1) = wi (sz) exp R (λi( sz)) η (6.12) To have agents passively converge to a consensus, we only update the weights if an agent’s hypothesis was incorrectly chosen or incorrectly eliminated by the weighted majority voting 4, i.e.:  Equation6 .12, if R(ds,e (ds)) = 1 ∧ a∗ 6= e (ds)  i i PASSIVE h d d ∗ d wi (sz+1) = Equation6 .12, if R( s,ei( s)) = 0 ∧ a = ei( s) (6.13)  PASSIVE h wi (sz ) otherwise Complexity Analysis 6.2. As passive coordination does not entail to learn models with re- spect to individual actions of agents, significant costs might result from retrieving the weighted majority voting (to update the passive coordination weights). Besides, we only keep and up- date an additional set of weights for CEXP experts (in our special case only a single), resulting in a total complexity of O(CEXP) for passive coordination only. The total cost of the resulting MEP is then also dependent on the calculation of agent weights pi, which are not necessarily required.

As adversarial learning approach, EWH enables robustness with respect to changing data distributions when paired with agent weight learning techniques (e.g. gradient descent on meta dependencies). Still, by taking into account the individual actions of other agents as well as allowing agents to respond to the latter, the expected cumulative reward of the MEP can be further improved.

6.5 Active Coordination with Heuristic Update Rules

For the active coordination protocol we allow multiple coordination steps (i.e. T > 1) and ACTIVE R learn agent weight function pi : T × S → , defined analogously to pi. If there is any disagreement for a decision candidate, all agents can alter their weights to have the committee converge to a consensus. Agents can thus either abandon their prediction or reinforce it. Our first active coordination approach is a heuristic (which we name ACTIVEH), which exploits novel properties for meta dependencies and actions of agents dependent on coordi- nation steps. The central intuition for the heuristic is that a coordination weight update for an agent λi should be be based on the decisions of other agents of the same step which are confident (i.e. dominant) for the current decision candidate. Here, a confident agent is taking d d the action its expert suggests (i.e. λi( s) = ei( s) holds), as the agent’s weight is sufficiently ACTIVE d h d h ∗ high (i.e. pi (t, s ) > Qt ( s ,at ) holds) or other confident agents’ experts are predicting the action, such that the agent involuntarily predicts its expert’s action. If not a single agent

4We assume binary rewards, but the strategy can easily be extended to continuous rewards settings by intro- ducing a threshold.

94 swl sdniist nraeraaiiy stelte r constantly are latter updates. parameter the additional as an readability, takes increase rithm to dependencies densities meta for identifier as parameters their well expert-state omit for as we labels where strategy, intuitive update use weight our well fines we demonstration, of their ease for For fight to agents. try confident both other agents of heuristic where actions The propagate intelligence, chosen. or swarm is hypothesis of weights expert’s idea agent basic initial the the implements on based thus majority weighted the confident, is gn egtetmt ic eol att lg n oriaetelte.W ilnow will We latter. the coordinate and align to algorithm. the want of only parts important we explain since estimate weight agent Require: 2 Algorithm Ensure: 21: 12: 11: 10: 20: 19: 18: 17: 16: 15: 14: 13: 9: 8: 7: 5: 4: 3: 2: 1: 6: 5.1,5.25.3. Table & 5.3 Section in introduced as instantiations dependency meta all use We saprn nteagrtm eol s ariemt eednist paeteinitial the update to dependencies meta pairwise use only we algorithm, the in apparent As • o all for ∆ return n for end p i ACTIVEH ie2-3: Line ← n for end o all for 0 p λ i ACTIVEH κ if if if end n if end if if end p i , i ACTIVEH ∈ cieCodnto ihHeuristics with Coordination Active ( d HORIZON e HORIZON t λ , j if if end if n if end ∆ ∆ K + ( a j eieaeoe l vial enl n agents. and kernels available all over iterate We d ← ← e e t ∗ ∈ s do 1 i i , 0 ( ∆ ∆ ( ( , = ) θ t Λ d d d ∆ ∆ ( + ← ← s s s IMPACT t \ ) = ) ) + + + 1 d λ 6= ← ∆ ∆ θ θ , i ( ( ∧ 1 d FUTURE PAST e e + + do κ κ e e , s i i p λ , , = ) ( = ) j j d ) δ δ i ACTIVEH ( ( θ θ s j d d ( CONFLICT ERROR ) κ κ MD s s d , , ) ) s δ δ HORIZON 0 HORIZON ∧ ∧ ) MD t PAIR κ λ λ , = MD PAST θ ( j j ( ( t IMPACT PAIR κ e t t , , , , MD d . cieCodnto ihHuitcUdt Rules Update Heuristic with Coordination Active 6.5 j FUTURE PAIR κ d d ( s , s s d + ) ERROR = ) = ) s ( PAIR κ 0 ) e ( , CONFLICT θ e ∈ then j e e ) j IMPACT ) j j R θ then ( ( − JOINT d d κ ≥ s s , 1 δ 0 ) ) ) hc aacsteipc fteweight the of impact the balances which then then then ∆ − MD θ ADDITIVE κ PAIR κ , , δ JOINT MD PAIR κ ( e , ADDITIVE i , l e de- 2 Algorithm . j , d s h ) h algo- The . 95

6 Chapter 6 Learning with Multiagent Expert Processes

• Line 4-13: All updates require the respective pairwise agent to be sufficiently confident in its expert hypothesis.

• Line 4-6: The joint performance of two experts of the same horizon is weighted by the independent error of the latter.

• Line 8-9: The weighted conflicting performances of two experts are subtracted to re- trieve the relative performance of agent λi

• Line 12-13: If the current decision candidate was generated from the expert of the respective agent, we update the weight by the respective density-weighted meta depen- dency

• Line 15-16: For experts of the next horizon, the respective density-weighted meta de- pendencies are directly used.

• Line 20: The agent weight is updated by the manually weighted aggregate of all updates for all kernels.

Complexity Analysis 6.3. Other than in passive coordination, we need to include complex- ities resulting from the novel decision process of length T. In the proposed heuristic, one recalculates agent weights in every coordination step t ∈ 1,...,T by comparing their pair- wise predictions. While calculations concerning agents of prior steps practically only need to be made once, pairwise comparisons for all agents of the same step have to be conducted for every coordination step. As a result, the complexity for all updates for a single decision candidate and single step (but all coordination steps) is within O(|Λh−1|λ h| + |Λh|2T). The total cost of the resulting MEP for a given episode additionally consists of the calculation of all meta dependencies and expert weights for all steps and respective decision candidates, i.e. O(DH|Λh|(|Λh−1|+|Λh+1|+|Λh|T +|MD|)) with D again the number of decision candidates for the state.

Although our heuristic provides an intuitive way to update agent weights, it might suffer from changing data distributions as well as high variance in expert performances, as hand- crafted, non-weighted rules are used. We thus continue with a learning-based approach for active coordination, which learns and update model parameters for agent-dependent meta de- pendencies.

6.6 Active Coordination with Reinforcement Learning

Until now, we ignored essential properties of the ECP for active coordination, as we did not account for future actions of agents and did not use knowledge about coordination step limit T. We now approach learning policies for MEPs by explicitly solving the ECP.

96 ii choosing representation, (iii) state as dependencies) meta learning on (ii) (based features agent ex- relational dependent an step find additional to an propose agents we require rewarded), not maximally actions. is dominant does choose action ECP to joint the heuristic correct as a Finally, (i.e. consensus other heuristic). plicit by weight the predicted agent’s actions to an and (similar candidate adapt decision agents to given the learning of by representations multiple i.e. tegrate action, the learning directly and policy the weight izing separate agent learn thus possible We policies predefining actions. coordination correlated by expert highly yields actions and continuous flexibility lacks the which discretize increments, to be would solution etu de- thus We ECP. the in act to how learn to agents for fine representations state novel need We Dependencies Meta Agent 6.6.1 ACTIVEL. approach respective the en t hypothesis. its being bno t xetsprediction expert’s its abandon of confident. performance is the agent approximate respective independently the only if dependency dependency meta meta agent respective an the heuristic, of coordination value active the our of to consists Similar ECP. the of state candidate p i ( l eiulpiws esrsrqiethat require measures pairwise residual All inexact: as behavior, pairwise any conditioned not is agent an of impact future The coordination defining (i) comprises policies coordination expert learning for approach Our One spaces. action continuous with dealing is ECP the solving of problem central The igeepr eadpnece r ietyue saetmt eednis sthey as dependencies, meta agent as used directly are dependencies meta expert Single h atpromneof performance past The t AMD , gn eadpnece AMD dependencies meta agent d s h ) PAIR κ ognrlz ere oiis eueetn eadpnece ocmatyin- compactly to dependencies meta extend use we policies, learned generalize To . , PAST d s h eaieaeterrresiduals error agent relative hr orsodn eadependencies meta corresponding where , AMD outactions robust ( t , AMD λ i , PAIR κ λ , ERROR PAIR κ j , , FUTURE d s AMD = ) ( e t i h apoc yparameter- by approach RL Search Policy a with agent each for after MEP corresponding the for , MD λ and SINGLE κ ( o needn ro ofrhrcniini needed: is condition further no error independent For . i t , , SINGLE , λ λ PAIR κ e j , i h , PAST h j , d − ( λ s t 1 . cieCodnto ihRifreetLearning Reinforcement with Coordination Active 6.6 = ) , j h ( scniindon conditioned is λ + ( t , e i 1 , λ i , MD d , apoc and approach RL Search Policy a with ECP the in i d e s , s ) d j h , s = ) o oriainstep coordination for PAIR κ d = ) λ λ , s ERROR ) i j 1 : ..tersetv ardagent, paired respective the i.e. , MD MD { λ j ( ( PAIR κ t SINGLE κ , e d , , MD FUTURE i 1 h , e − e h 1 j − )= j , ( 1 d e e ( geigwt t xetand expert its with agreeing i s j e T , ( ) ( i d d 1 , e s s oriainses eterm We steps. coordination d h i h { ) − s , λ 1 ) e r xeddbsdo the on based extended are ) j ( } h t j t 1 + , agent , d 1 s { )= e , d j ( e s t , j h d ( ) d h s − λ ) 1 } )= i n decision and d s h } osnot does (6.15) (6.14) (6.16) (6.17) d 97 s h

6 Chapter 6 Learning with Multiagent Expert Processes

The joint performance of two experts λi,λ j is additionally conditioned on two experts agree- ing:

κ,JOINT d κ,JOINT d 1 d d 1 d d AMDPAIR (t,λi,λ j, s) = MDPAIR (ei,e j, s) {λ j(t, s)=e j( s)} {ei( s)=e j( s)} (6.18)

Similarly, pairwise expert measures with regards to additive impact are only used if λi,λ j disagree:

∀l ∈ {ADVANTAGE,ADDITIVE} : l,κ d κ,l d (6.19) 1 j d h d h 1 d h d h AMDPAIR(t,λi,λ j, s) = MDPAIR(ei,e j, s) {λ (t, s )=e j( s )} {ei( s )6=e j( s )}

κ d We now define AMD (t,λi, s), the kernel-dependent agent meta dependency feature vec- h d h tor, for agent λi and decision candidate s in coordination step t :

κ h d 1,κ h d AMD (t,λi , s) = hAMDSINGLE(t,λi , s), l,κ h h d h ∪∀l∈{2,3,4,5}∀λ h∈Λh\λ h AMDPAIR(t,λi ,λ j , s ), j i (6.20) 6,κ h h+1 d h+1 ∪∀λ j∈Λ AMDPAIR(t,λi ,λ j , s), 7,κ h h−1 d h−1 ∪∀λ j∈Λ AMDPAIR(t,λi ,λ j , s) i

The agent meta dependency feature vector is finally defined as union over all available kernels, i.e.:

h d κ h d AMD(t,λi , s) = ∪∀κ∈KAMD (t,λi , s) (6.21) h d Note that the agent meta dependency feature vector AMD(t,λi , s) potentially changes after d h d h each coordination step, while MD(ei, s ) is constant for s . We no, present our strategy to ACTIVEL d h act in the ECP by adapting agent weight functions pi (t, s ).

6.6.2 Policy Search RL for the ECP

As the ECP has a finite-horizon (i.e. T), we can learn T separate policy functions, which directly estimate the continuous action. More specifically, we learn T linear models based h d on agent features AMD(t,λi , s), which are additively combined and driven towards error residuals with respect to the actions an agent takes. We therefore keep T copies of model ACTIVEL ACTIVEL parameters φi,1 ,...,φi,T such that an agent weight at coordination step t is defined as:

98 L weights. their change not would agents residual all if [49] boosting stx riae,weegnrlzto sdfcl o ml riigst.T aeagent make To sets. training small throughout for for difficult voted is generalization more where predictions within images, converge or to not text prone as is ECP the that 1 such steps predictions coordination regions, their unexplored episodes, for different erroneous of be candidates decision might across generalize to have agents As Decisions Agent Robust 6.6.3 as occurs, update action. no committee Otherwise, the action. expert’s its need pursue we should that states function loss the or for label The update action. joint to the for use weight average we current policy, agent the DIST update to MEP, direction ECP, the the which of of in function decide value to the that, Note abandonment. oteatosa gn chooses. agent an actions the to oiyi o-ttoay erigtersligpolicy resulting the Learning non-stationary. is policy label SQUARED ewl o ics o gnsdrv hi nldcsosafter decisions final their derive agents how discuss now will We parameters model for updates GD conduct to need we end, this To DIST where ocp ihdcso yais euetesurdls function loss squared the use we dynamics, decision with cope To where e φ i ( i y y t ACTIVEL i d , , t t t i i d s + , , s d d h h 1 s s ) γ h h stelte eit the depicts latter the as MEP, the of function value the use only we update, the of φ ( ihrsetto respect with = ste acltdas: calculated then is ∈ φ i t i ACTIVEL , [ Q      ntrso h itnet hoigtecretato,i ead for rewards if action, correct the choosing to distance the of terms in t 0 ← h t i , p min max otherwise 0 , o h eutn on cin hncluaigtemnmldsac value distance minimal the calculating When action. joint resulting the for , 1 d i ACTIVEL φ s ] h robust i ACTIVEL stedson atrhpraaee,wihw e to set we which hyperparameter, factor discount the is , t ( = ( ,..., DIST ; | DIST T AMD Q eetn hshuitcwt esr of measure a with heuristic this extend We . h etu aeaet hoeteato hc hyms frequently most they which action the choose agents have thus we , ( T ( t s t i , , h rbe seaebtdfrhg-iesoa oaiis such modalities, high-dimensional for exacerbated is problem The . a − , t d i d ( , a s t ∗ d t s h s t , ∗ γ = ) , h r o eetdin reflected not are λ ) φ | ∇ , i − h i ACTIVEL φ , Q , t φ p i d i ACTIVEL ACTIVEL , p , t t t i ACTIVEL s oass h cino naetadtevlefunction value the and agent an of action the assess to , i ) ( , t y , . cieCodnto ihRifreetLearning Reinforcement with Coordination Active 6.6 d t i , s d L T s h h φ SQUARED ) T ) ( i t − sfrlna ersin nagent-dependent An regression. linear for as , φ 1 ) i , t α d ) s + ) with if if p ( Q Q φ t i 0 ( t t ∈ i ACTIVEL t , ( ( ∑ t α 2 , π s s ,..., d t t i i i ACTIVEL , , s ECP ECP ∈ h t ) γ R , , or AMD a a ; h iia itnet i or win to distance minimal the AMD t t i i , , Q ECP ECP ( t h , ( ( d t ) ) s λ , ( s , T < > i t λ ) a , dominance antatvl improve actively cannot φ i t hsrdcst gradient to reduces thus ∗ λ oriainsteps. coordination , Q Q i ) ACTIVEL d , i h t si mle that implies it as , h h s , γ ) ( ( d T d d = s s s φ ) h h , i t ,a h learned the as 1, , , y 0 a a t i i.e.: , , t t ∗ ∗ d ) ) s ihregards with h ) λ i ( (6.22) (6.23) (6.24) t , d s 99 h λ ) i

6 Chapter 6 Learning with Multiagent Expert Processes

Definition 6.13 (Agent-specific dominance of an action). Given agent decisions in T coor- d h d h d h d h d h d h dination steps λi(1, s ),...,λi(t, s ), let σi( s , a ) = |{t|∃t ∈ 1,...,T : λi(t, s ) = a }| d sh d h d h d h be the number of times λi chose a and Ai = { a |∃t ∈ 1,...,T : λi(t, s ) = a } the set of actions λi has chosen throughout 1,...,T. d h The agent-specific dominance of action a for λi is defined as:

DOM d h d h d h d h  ∆i ( s , a ) = max ∆i( s , a ),0 (6.25) where

h d h c h i d h h d h h c h  ∆i(s , a ) = min {∆|∀ a ∈ Ad sh \ a : σi(s , a ) − σi(s , a ) − ∆ = 0} (6.26)

holds.

d h d h d h ∆i( s , a ) returns the minimal occurrence difference of action a compared to the resid- DOM d h d h ual actions taken by λi. ∆i ( s , a ) only returns 0 if the minimal difference is negative, i.e. action dah is not dominating the residual actions. The resulting dominance measure for agents and action thus expresses the absolute dominance of an action based on all relative dominance values of an action compared to the residual ones. Based on the dominance estimates, we now define a state action value function for each h h ≥0 agent – qi : S ×A → R – which combines the average frequency of an action with its dom- inance. The agent-dependent state action value consists of the normalized choice frequency ≥ ≥ of a potentially penalized by minimal dominance ∆MIN ∈ N 0 and impact factor µ ∈ R 0:

h DOM h h σi(s ,a) ∆MIN − ∆i (s ,a)  qi(s ,a) = max i − µ MIN ,0 (6.27) |Ash | ∆ The minimal dominance parameter is thus to be set according to the necessity of conver- gence or availability of a dominant action. A critical observation is that convergence might depend on the number of coordination steps T which – when set to low – might bias the dominance of an action. h h Finally, the decision of an agent after T coordination steps λi : S → A is defined as action with maximal value:

d h d h λi( s ) = max qi( s ,a) (6.28) d sh a∈Ai

Complexity Analysis 6.4. As we approach active coordination in MEPs, the respective cal- culations are dependent on T coordination steps. Both, the complexity for conducting T active coordination updates for all agents as well as the overall complexity for the resulting MEP are the same as for the active coordination heuristic. More specifically, the set of agent meta

100 fra for MEP resulting the of cost pairwise total on the based result, O is a within is dominance As again, to episode, step. respect given current with the decision of measures). agent comparisons behavioral final agent or the kernels for available generally cost of gradi- or the numbers although agents Finally, higher bound, of (i.e. number lower increasing dependencies our with meta for complex richer complexity more significant becoming add are not computations do ent updates heuris- GD coordination active The the as tic. comparisons agent pairwise same the requires dependencies uevsdlann,weew es eadpnece sfauesaefrlna regression. linear for space feature as dependencies meta reuse we where ECP. learning, the supervised with correlation expert and ad- to frameworks expert due decision-making biases multi-step prevalent reducing to for as such compared coordination challenges, properties expert crucial unique approaches study has to MEP MEP s A formalized vice. we chapter, this In Summary 6.8 actions. predictions expert function of agent set complete define the keep could one so, agent infinite- require might domains Other known). is approaches. process horizon bound coordination available the for an horizon require the we single (i.e. addition, a In having by approaches. remedied novel be quires could This available. are experts meta-agent of amounts large our when indi- sible apply actions updates. available can weight all conflicting One to exploit heuristic to potential implement. action open robust to is the trivial there as but well is vidually, as which continuous approach comparisons, RL with requires Search expert vectors then Policy of comprise features agent then number relational would larger Calculating ECP a action. available the each for for space adaptions weight action to extended The be of can them. action methods weights weighted Our highest effects. the reciprocal lose uses ordination might only one agent decision, each local that its assumption simplifying our Given Discussion 6.7 most the optimized. are is costs agents dependent among step communication coordination when The reduced directly state. are and the significant for candidates decision of ber stig eepoe online explored we setting, MEP budget-free the in experts their assess to agents enable To that assumption simplifying the make we that note Finally pos- be not might which times, all at executed are experts all that assume approaches Our λ i hra h hsnato ih epooe yvroseprsin experts various by proposed be might action chosen the whereas , hr naetde o ii h vial cin eoecodnto,btonly but coordination, before actions available the limit not does agent an where , rslc xet adaet)o ydsrbtn ugt mn gns hc re- which agents, among budgets distributing by or agents) (and experts preselect λ i swl saetwih function weight agent as well as ( DH | Λ h | ( | Λ h − 1 | + | Λ h + 1 | E + i p h i | e o h oriainpoesand process coordination the for Λ i ootu etro l expert all of vector a output to ersnsteato hieof choice action the represents h | T + T | MD o oriainsteps coordination for | )) ihDtenum- the D with . Discussion 6.7 E i h vnmore Even . ulco- full 101

6 Chapter 6 Learning with Multiagent Expert Processes

We approached the coordination problem in MEPs with three different approaches, assum- ing different degrees of available information. In the passive coordination setting, we reused the EWH algorithm to learn additional weights to adjust available expert confidences. When assuming perfect information for each agent, coordination for ECP can be learned, which we dealt with in two further approaches. The first active coordination method we introduced is based on a heuristic, which leverages pairwise meta dependencies as well as coordination step dependent agent predictions. For our learning-based active coordination approach, we defined agent meta dependencies by taking into account pairwise coordination step dependent agent decisions, which enable the application of function approximation to agent coordination for MEPs. Our expert coordination approach was based on gradient boosting and a robust action selection heuristic, which enabled to learn policies for the ECP as well as corresponding MEP. Our work on MEPs and diverse coordination approaches addressed Research Question3, where the corresponding Hypothesis3 is partially confirmed and will be revisited after the empirical evaluation in Chapter7.

102 step NED the from resulting candidates ( decision Finally, candidates. decision of combinations in ( proposed NED entities For named all hypotheses. their queried by having termined After set construed. of is distribution experts state-expert NER the which for candidates decision n nlysummarize finally and 7.5. 7.3 ) Section implemen- (Section in baseline results evaluation chapter as the the well discuss as and methods present our then We of tations. instantiations parameters, all explain we where TEXT TEXT tokens of 7.1 Definition knowledge structured (semi-) ( a NER in available For resources base. i.e. entities, named disambiguated over H contributions and challenges the 6. to & refer 5 thus We MEP Chapter or 2.1. in EP Section described as in introduced modeled as tasks NERD, multi-step task finite for for methods learning introduced all evaluate We Introduction 7.1 validating is evaluation. chapter references latter. empirical introducing this Our an of the by goal on solve 6 The based and 3. contributions to and respective 5 methods 2 the 1 , Chapter as Hypothesis by on well based approached approaches as partially respective the advice been have expert questions with research tasks These multi-step for framework a chapter This MEPs & EPs with Learning of Evaluation 7 Chapter h = o ic ftext of piece a For 7.2), (Section evaluation our of setup the describe first We follows: as proceeds chapter The h eutn ut-tptssa ela oee eiinpoesshv finite-horizon have processes decision modeled as well as tasks multi-step resulting The = with 2 eprso set of experts NED queried by proposed entities named disambiguated are 3) s salpsil eune TEXT sequences possible all as S 1 tt pc vrwords, over space state a hc r ocre ihdeveloping with concerned are which 3, and 1,2 Question Research with deals ( n s h 104]. [100, are chapter this for i -grams) hr eoe h oiino h oe,w en h e fngasfor n-grams of set the define we token, the of position the denotes i where = E ) eiincniae are candidates decision 1), TEXT 1 eko h culnme fpsil eiincniae,a de- as candidates, decision possible of number actual the know we , . ie ic ftx fsaes–TEXT – s state of text of piece a Given s l possible all , h = n h e falsae scntutdbsdo l valid all on based constructed is states all of set the and 1 S s 2 i ,..., n tt pc vrnmdette and entities named over space state a gaswith -grams h = TEXT ) h e fdcso addtscnit of consists candidates decision of set the 2), n gaswihw o en ngeneral. in define now we which -grams s i + n n . = 1 ,..., | Text s enda ree set ordered as defined – s | aeu h possible the up make S 3 tt space state a E 2 . 103

7 Chapter 7 Evaluation of Learning with EPs & MEPs

Kernel Implementation Candidate Lingual Type POS tag Candidate Word embeddings Pretrained on articles Candidate MinHash Jaccard similarity State Length Number of characters State Extra Characters Count #, @ State Lingual Type Average POS tag

Table 7.1: Used kernels for evaluation.

7.2 Setup

Our experimental setup description is divided into data sets, meta dependencies as well as their configurations, Web services used as experts and, finally, the evaluation measures with respective baseline- and weight learning approaches.

7.2.1 Data The data sets we use are (i) Microposts 2014 corpus [10] which consists of a collection of tweets (Z = 2034) of no specific topic and (ii) Spotlight corpus [91] which consists of news articles of no specific topic, each consisting of several sentences (Z = 59), where Z corre- sponds to the number of resulting episodes forEPs or MEPs (i.e. the number of tweets or sen- tences, respectively). While Microposts 2014 comprises an average of 1.84 entities per tweet, Spotlight is denser with an average of 5.69 entities per sentence. Also, the tweets available in Microposts 2014 contain significantly more grammatical errors, abbreviations and extra char- acters such as hashes, symbols or hyperlinks than news articles in the Spotlight corpus. For modelingEPs and MEPs, we treat each tweet and each sentence within an article as single state sample, i.e. as piece of text TEXTs as defined for n-grams. Decision candidates thus comprise tokens or token sequences.

7.2.2 Meta Dependencies The central variables for meta dependencies are the used kernels to calculate similarities among states and/or decision candidates. After presenting the kernels we use, we list the chosen values for density parameters Mκ and δκ . Kernels: We summarize the kernel set K we used in Table 7.1, where state or candidate indicates whether the kernel is calculated for the complete sentence (or tweet), or for the decision candidate, respectively. The decision candidate kernels assess a token or token sequence in terms of different perspectives. The lingual kernel is based on part-of-speech (POS) tags, where each possible

104 3.23). Definition (see fold a in episodes of number the vary We to neighborhood. a de- of is latter part The neighborhood. parameter kernel-induced on the pendent in candidates decision of amount expected among Densities: choose decision to a ability its for in context experts available respective of the hypotheses. candidate influence length heavily the might models which context, thus and Both article candidate. news a in sentence The candidate. sn h atr ecnbte cl h optto ftekernel. the of computation the scale better can we latter, the Using large on trained Web were the the embeddings on different The articles occur. two news often employ of latter amounts the we contexts sequences the on token in based categories or candidates respective tokens the compare compares directly The simply To kernels. function kernel fashion. the binary and a category a depicts tag apoce,ie eddntueWbsrie which services Web use not did we i.e. approaches, NED chose respective pure we their where and of GERBIL, approaches any benchmark NERD tune NER on not pure based do was and process selection services The Web parameters. as available experts use exclusively We Experts We 7.2.3 small. relatively are sets parameters. training both used for the valuable thresholds as any lower return used, choose contrast, never thus In are might thresholds thus neighborhood. and the high specific in if more states neighborhoods are of candidates number decision the for to as kernels regards well the with thresholds as higher value assigned similarity get required thus the and candidates, occurring all of average the The enl o ttsgtudtdmr fe,a hyete r eaieygnrlo oss of consist or general relatively are either they as often, more updated get states for Kernels 2 1 iHs kernel MinHash • • • • https://github.com/ekzhu/datasketch/ https://code.google.com/archive/p/word2vec/ M M δ δ tt-eedn kernels state-dependent length embeddings length embeddings o acltn h este,w edt e parameter set to need we densities, the calculating For = odebdigkernel embedding word = δ M extra egho state a of length = xr characters extra = extra δ M minhash = 2 δ minhash = δ κ oprstersetv icso et ae ncaatroverlap. character on based texts of pieces respective the compares tt lingual state enn h iiaiytrsodfradcso addt obecome to candidate decision a for threshold similarity the defining , M tt lingual state = = δ addt lingual candidate M r eaieygnrladrpeettecneto decision a of context the represent and general relatively are addt lingual candidate M sdfie sttlaon fcaatr ihnatetor tweet a within characters of amount total as defined is = swl as well as κ = 0 sue ognrlz h oprsno et fdecision of texts of comparison the generalize to used is 1 and . 7 n r ietyrue o u vlain ncontrast, In evaluation. our for reused directly are and Z 4 δ κ = o h enl sflos where follows, as kernels the for iga types lingual = 0 . 5 ( 10 cesdo 05/01/2018 on accessed Z ( cesdo 05/01/2018 on accessed hnrfieteassmn fthe of assessment the refine then M ) κ hc xrse the expresses which ) Z corresponds . Setup 7.2 105

7 Chapter 7 Evaluation of Learning with EPs & MEPs

NER Expert NED Expert Stanford Tagger (ST)[45] AIDA (AIDT)[56] Spotlight Spotter (SPS)[91] Spotlight Tagger (SPT)[91] FOX [119] AGDISTIS (AGDT)[130]

Table 7.2: Experts for NER & NED.

Microposts ’14 Spotlight Appr. F1 Prec Rec F1 Prec Rec ST 0.49 0.89 0.34 0.22 0.86 0.13 FOX 0.49 0.91 0.33 0.22 0.84 0.13 SPS 0.56 0.4 0.89 0.64 0.53 0.81 AGDT 0.45 0.6 0.36 0.35 0.7 0.24 AIDAT 0.32 0.45 0.25 0.24 0.29 0.2 SPT 0.65 0.74 0.58 0.76 0.79 0.72

Table 7.3: Baseline performances of experts. solve the complete NERD task without allowing to input texts annotated with named entities for NED. Table 7.2 summarizes the experts we used for NER and NED, where the horizontal alignment expresses how the experts are actually used in combination in the real-world. Table 7.3 shows the baseline precision, recall and F1-measure outcomes for the chosen data sets, which are the proposed evaluation metrics by GERBIL and defined in Section 3.2.5. Experts SPS for NER and SPT for NED provide the highest recall for tweets and news arti- cles. SPS does, however, yield relatively low precision scores, resulting in numerous wrongly identified named entities. Note that FOX andST strongly correlate, as FOX is an ensemble learning approach usingST. Central learning challenges thus are to detect correlations of FOX andST as well as learning for which data characteristics SPS is correct.

7.2.4 Evaluation measures We evaluate our approach in terms of two target measures: a) accuracy estimation of experts and b) outcome optimization of theEP or MEP. For each target measure, we perform 10- fold cross-validation and use precision, recall and F1-measure (as proposed by GERBIL) as metrics. a) Accuracy Estimation: The goal is to accurately estimate weights (i.e. weh ) for all experts and thus to predict if the suggested action of an expert is correct or not. Here, we are not in the fullEP or MEP setting

106 factor: candidate and decision a condition for dependencies meta single-expert all expert on Based . 5.3.1 Section in fold. respective the of data training left-out the on expert the optimize to weights on expert the adapt they as vote. approaches, majority coordination weighted MEP use not do a wrong. and correct is a pothesis wrong, a is correct, hypothesis is expert’s state. hypothesis an single expert’s a for an once that experts all predicting execute we i.e. constraints, budget with deal not do thus and )Otoeoptimization: Outcome b) l eadpnece olanepr egt.Nt htw nytano we tts even follows: states, as tweet set on are train parameters only The we with states. that 0 budgets article Note expert news with for weights. deal predicting expert when to learn algorithm to EWH dependencies the meta extend all we where de- OnEWH-All, meta follows: tagged all as with set are OnGD-Single parameters for The as weights. configuration expert GD learn same to pendencies the use we where OnGD-All, follows: Here, as set weights. are expert parameters learn The to learner. framework as dependencies learning meta online all the with use we GD use we where OnGD-Single, oe.Nt ht gi,w nytano we tts vnwe rdcigfrnw article news for predicting when even follows: states, as PSL tweet set of are on parameters train implementation The only existing states. we an again, that, on Note rely model. we where BatchPSL, tagged . 8 Single-Avg enwdsrb l aeie swl sisatain forapoce.Nt htwe that Note Gl-Avg approaches. our of instantiations as well as baselines all describe now We metrics:A performance all for properties following the define We OnEWH-All OnGD-All OnGD-Single BatchPSL-All left-out , 4 3 β https://github.com/linqs/psl https://github.com/JohnLangford/vowpal_wabbit/ = e i.e. , 1 h rtbsln stesrihfradgoa efrac egtn o experts for weighting performance global straightforward the is baseline first The : . 5. .Teepr egti acltda rcso fan of precision as calculated is weight expert The [113]. to similarly data, training w MD ,tagged 6.3), Section (see approach learning online our of instantiation An : h eodbsln sbsdo igeepr eadpnece sdefined as dependencies meta single-expert on based is baseline second The : e θ = 5.4), Section (see approach RL model-free online our of instantiation An : SINGLE κ ,tagged 6.3), Section (see approach learning online our of instantiation An : h est) ecmuetewihe vrg ihdniisa weighting as densities with average weighted the compute we density), the ), 5.5 Section (see approach RL model-free batch our of instantiation An : , ∑ l MD SINGLE κ ( , PRECISION e , as negative false ∑ d s MD ) (with SINGLE κ ( , PRECISION e , d s ) as positive false κ ∈ MD h kernel, the oplWabbit Vowpal ( ( e SINGLE cesdo 05/01/2018 on accessed , swogypeitn hta xetshptei is hypothesis expert’s an that predicting wrongly is d Z s ) POOL ∈ MD ( e η , d SINGLE s = ) = l sawogypeitn hta xetshy- expert’s an that predicting wrongly a is θ h ntnito o eairlmaueand measure behavioral for instantiation the 0 PRECISION κ Z . renegative true , ( 2. BATCH δ e , d 3 s ) configuration GD standard with [81] θ PRECISION κ = , ) δ ( d ( 10 Z cesdo 05/01/2018 on accessed s h . ) MD scretypeitn that predicting correctly is ( repositive true d SINGLE κ s , η h PRECISION ) = 4 0 p . oitgaeour integrate to 2. min = scorrectly is ) . Setup 7.2 0 . 05 d , (7.1) s η 107 for =

7 Chapter 7 Evaluation of Learning with EPs & MEPs

Our second goal is to optimize the overall outcome of theEP or MEP, i.e. to maximize the expected cumulative reward. We define the following properties for all performance metrics: A true positive is a correctly found (disambiguated-) named entity, a true negative is the correctly predicted abstinence of a (disambiguated-) named entity, a false positive is a wrongly found (disambiguated-) named entity and a false negative is the wrongly predicted abstinence of a named entity. We again describe all baselines as well as instantiations of our approaches. The general 0 0 1 1 budget parameters – if not defined otherwise – are BQUERY = |E | and BQUERY = |E |. AIDA: The NERD approach as deployed in the real-world with e1 = ST, e2 = AIDAT. DBS: The NERD approach as deployed in the real-world with e1 = SPS, e2 = SPT. AGD: The NERD approach as deployed in the real-world with e1 = FOX, e2 = AGDT. EP-Gl-Avg: The firstEP baseline, where we use Gl-Avg as described before, now inte- grated into anEP. EP-Loc-Avg: The secondEP baseline, where we use Loc-Avg as described bfore, now integrated into anEP. EP-OnGD-All: Our online learning approach OnGD-All now integrated into anEP. EP-OnEWH-All: Our online model-free RL approach OnEWH-All now integrated into an 0 1 EP. The additional parameters for expert execution budgets are BQUERY = 2 and BQUERY = 4, i.e. two (state,expert) pairs can be queried for NER and four (state,expert) pairs for NED. EP-BatchPSL-All Our batch model-free RL approach BatchPSL-All now integrated into 0 1 anEP. The additional parameters for expert execution budgets are BQUERY = 2 and BQUERY = 4, i.e. two (state,expert) pairs can be queried for NER and four (state,expert) pairs for NED. MEP-PCoord-Free: Our passive coordination approach integrated into a MEP with no expert weight estimation technique. The parameters are set as follows: ηEWH = 1.0 MEP-PCoord-OnGD-All: Our passive coordination approach integrated into a MEP with OnGD-All as expert weight estimation technique. The parameters are set as follows: ηEWH = 1.0 MEP-ACoordH-Free: Our active coordination approach integrated into a MEP with no expert weight estimation technique. The parameters are set as follows: θ IMPACT = 1.0, T = 10 MEP-ACoordH-OnGD-All: Our active coordination approach integrated into a MEP with OnGD-All as expert weight estimation technique. The parameters are set as follows: θ IMPACT = 0.1, T = 10 MEP-ACoordL-Single-Free-All: Our active coordination approach without gradient boosting and no initial confidence. Here, one single learner is invoked and updated T times. GD The parameters are set as follows: T = 7,α = 0.5, µ = 0.3,∆MIN = 3,η = 0.4. MEP-ACoordL-Single-Conf-All: Our active coordination approach without gradient boosting and expert confidence based on the provided Web service. Here again, one sin- gle learner is invoked and updated T times. The parameters are set as follows: T = 7,α = GD 0.5, µ = 0.3,∆MIN = 3,η = 0.4.

108 n.Teprmtr r e sfollows: as set are parameters The ing. n mrvsterslswt epc otepr aoiyvtn G-v)a ela the as well as (Gl-Avg) voting majority pure methods the prior to to respect close with results both stable results for provides dependencies) the (BatchPSL-All) meta improves approach single-expert and SRL only improves Our (using OnGD-All for GD-Local sets. dependencies to data meta respect all using with performance that results all noticeable for the is dominating it is F1-measure OnGD-All end, of articles, this news To terms For in metrics. recall. results and overall precision best as the well reach as dependencies meta all with (OnGD-All) NER: Estimation Accuracy for Results of Summary 7.3.1 7.6). Table (see optimization outcome for b) estimation and accuracy 7.5) a) & for 7.4 Table results evaluation the summarize first We Results 7.3 service. Web provided the on based confidence pert service. Web provided the on based confidence pert service. Web provided the on based confidence pert follows: as set are parameters The ing. follows: as set are parameters The ing. ae noreauto eu,w o rsn n ics h epcieresults. respective the discuss and present now we setup, evaluation our on Based MEP-ACoordL-Grad-Conf-All-3 MEP-ACoordL-Grad-Conf-All-2 MEP-ACoordL-Grad-Conf-All-1 MEP-ACoordL-Grad-Free-All-3 MEP-ACoordL-Grad-Free-All-2 MEP-ACoordL-Grad-Free-All-1 GD pure and (OnEWH-All) algorithm EWH the on based methods our tweets, For Eauto eut o )acrc siainwith estimation accuracy a) for results Evaluation 7.4: Table Appr. BatchPSL-All OnEWH-All OnGD-All OnGD-Local Loc-Avg Gl-Avg 0.82 0.85 0.85 0.84 0.76 0.71 F1 irpss’14 Microposts u ciecodnto prahwt rdetboost- gradient with approach coordination active Our : u ciecodnto prahwt rdetboost- gradient with approach coordination active Our : boost- gradient with approach coordination active Our : eso fMPAorLGa-reAl3wt ex- with MEP-ACoordL-Grad-Free-All-3 of Version : ex- with MEP-ACoordL-Grad-Free-All-2 of Version : ex- with MEP-ACoordL-Grad-Free-All-1 of Version : T T T 0.8 0.84 0.78 0.82 0.65 0.75 Prec = = = 14 14 7 , α , , α α = 0.85 0.87 0.93 0.86 0.93 0.67 Rec = = 0 1 0 . 5 . . 0 5 , µ , , µ µ = 0.79 0.82 0.86 0.79 0.78 0.7 F1 = = 0 0 0 . 3 . . , 05 3 Spotlight ∆ , ∆ , MIN 0.78 0.84 0.96 0.95 0.66 0.69 Prec ∆ MIN MIN = = h 3 = 6 , h = η 0.81 0.8 0.78 0.69 0.94 0.72 Rec 6 , = η , GD and 1 η GD 1. GD = = . Results 7.3 0 = h 0 . 4. = 0 . 4. . 4. (see 2 109

7 Chapter 7 Evaluation of Learning with EPs & MEPs

Microposts ’14 Spotlight Appr. F1 Prec Rec F1 Prec Rec Gl-Avg 0.56 0.63 0.51 0.42 0.39 0.46 Loc-Avg 0.71 0.6 0.87 0.56 0.44 0.78 OnGD-Local 0.64 0.74 0.57 0.44 0.64 0.33 OnGD-All 0.85 0.8 0.92 0.78 0.74 0.83 OnEWH-All 0.73 0.67 0.81 0.67 0.62 0.72 BatchPSL-All 0.8 0.73 0.88 0.74 0.72 0.77

Table 7.5: Evaluation results for a) accuracy estimation with h = 2.

Microposts ’14 Spotlight Appr. F1 Prec Rec F1 Prec Rec DBS 0.33 0.24 0.51 0.45 0.38 0.57 AGD 0.28 0.59 0.19 0.13 0.56 0.07 AIDA 0.33 0.69 0.21 0.24 0.69 0.15 EP-Gl-Avg 0.41 0.51 0.34 0.36 0.48 0.29 EP-Loc-Avg 0.47 0.59 0.38 0.42 0.49 0.37 EP-OnGD-All 0.44 0.71 0.33 0.35 0.58 0.25 EP-OnEWH-All 0.54 0.6 0.5 0.43 0.44 0.42 EP-BatchPSL-All 0.56 0.72 0.46 0.48 0.64 0.39 MEP-PCoord-Free 0.35 0.69 0.24 0.25 0.88 0.15 MEP-PCoord-OnGD-All 0.46 0.67 0.34 0.48 0.69 0.36 MEP-ACoordH-Free 0.41 0.72 0.29 0.44 0.48 0.41 MEP-ACoordH-OnGD-All 0.5 0.79 0.37 0.63 0.7 0.57 MEP-ACoordL-Single-Free-All 0.37 0.52 0.29 0.44 0.49 0.39 MEP-ACoordL-Single-Conf-All 0.32 0.49 0.24 0.39 0.56 0.31 MEP-ACoordL-Grad-Free-All-1 0.48 0.61 0.41 0.58 0.6 0.57 MEP-ACoordL-Grad-Conf-All-1 0.42 0.53 0.34 0.54 0.56 0.52 MEP-ACoordL-Grad-Free-All-2 0.51 0.65 0.42 0.51 0.59 0.47 MEP-ACoordL-Grad-Conf-All-2 0.43 0.55 0.34 0.6 0.61 0.59 MEP-ACoordL-Grad-Free-All-3 0.41 0.6 0.31 0.52 0.56 0.48 MEP-ACoordL-Grad-Conf-All-3 0.45 0.59 0.37 0.51 0.54 0.48

Table 7.6: Evaluation results for b) outcome optimization

110 CodFe) sol rcso a asdbtrcl erae.Smlrcniin are conditions Similar F1-measure decreased. in improvements recall small but however, (MEP- raised where, tweets MEP-PCoord-OnGD-All for was vote for precision majority prevalent non-weighted only the of as F1-measure the PCoord-Free), improve to able not was non-monolithic prior values over recall improvements average but only DBS) are to methods. EP-OnEWH-All compared tweets and improvements for latter precision as the average trends for EP-OnEWH- same yields to contrast the only (in but which precision expressive improve EP-OnEWH-All All further less can and are EP-BatchPSL-All results EP-BatchPSL-All observed. the be where can tweets, articles, on EP- news trained where only For recall, were DBS. performing for for only than holds while Especially worse recall, same non-monolithic slightly for The results articles. for overall similarly news best latter). the (and as provides the OnEWH-All EP-BatchPSL-All well behind by remaining as improved while tweets considerably EP-OnEWH-All on are metrics values precision performance tweets, all for results turn, the in and, and values tweets recall lower both reaches for values – precision baselines (EP-OnGD-All) F1-measures. higher non-monolithic dependencies prior yielding meta news than while all for and articles – wins news GD and DBS on results based and the approach improve average our cannot stay for end, recalls hold this but To conditions observed, to articles. same are respect where The (EP-Loc-Avg), improvements with dependencies small meta results. recall single-expert further average superior on based an provides voting reach still majority weighted only DBS the does articles, it news baselines, For prior DBS. DBS to recall, compared For news tweets For for worse. performs recall. still before. high DBS than relatively clearer but and even closer, AIDA precision and are AGD low precision butoutperforms show for precision DBS results high of relatively the results reach articles, AIDA the and and AGD expert recall Here, single the low F1- 7.3 . for of Table observable terms was in as in shown recall, tweets and performances for precision results in close differ reach significantly but AIDA measure, and AGD DBS, baselines monolithic The Optimization Outcome for Results of Summary 7.3.2 especially – latter the than worse Gl- OnGD-All. performs methods to GD-Loc baseline compared that the however, dominates Notice, still Loc-Avg. extension) and EWH and Avg sets our data (i.e. both the OnEWH-All However, for approaches measures. measures. residual all all the to supersedes regards now with BatchPSL-All learner results relational overall best the reaches dependencies further meta no with (Loc-Avg) dependencies kernels. meta respective expert the of single weightings with voting majority weighted 6.4 Section in introduced as coordination passive approaches, coordination with Starting improve further – EP-BatchPSL-All and EP-OnEWH-All – EPs for methods learning Our F1-measure the improve already can (EP-Gl-Avg) voting majority non-weighted the While NED: (nDAl ihall with (OnGD-All) GD pure on based method our articles, news and tweets both For . Results 7.3 111

7 Chapter 7 Evaluation of Learning with EPs & MEPs are observable, as recall was improved, while precision remained stable. While for MEP- PCoord-Free the results are the same for news articles, MEP-PCoord-OnGD-All considerably improves the results of MEP-OnGD-All for all performance metrics. The results for our active coordination heuristic (see Section 6.5) show the same tendencies as for passive coordination, while the individual outcomes for each performance metric are better. For both tweets and news articles without available expert weights (MEP-ACoordH- Free), the results stay stable with respect to MEP-Gl-Avg, where precision increases but recall slightly drops. WithGD and all meta dependencies as expert weights (MEP-ACoord-OnGD- All), we can significantly increase F1-measure and precision for both tweets and news articles. Here, the highest overall precision is reached for both data sets and respective recall values increase with respect to MEP-OnGD-All. Finally, observing the outcomes of our learning-based active coordination approach (see Section 6.6), we first discuss results of the single learner (i.e. not taking advantage of having a finite-horizon). Here, for MEP-ACoordL-Single-Free-All without expert weights as well as for MEP-ACoordL-Single-Conf-All with using Web service confidences only below-average outcomes for all performance metrics and both data sets are achieved. The results signifi- cantly improve by exploiting gradient boosting with individual models for individual coordi- nation steps for all instantiated approaches (MEP-ACoordL-Grad-Free-All-x, MEP-ACoordL- Grad-Conf-All-x with x = 1,2,3) and both data sets. The performance of MEP-ACoordL- Grad-Free-All-2 is most notable, as the F1-measure for tweets improves on the results of MEP-ACoordH-OnGD-All by yielding significantly higher recall without using initial expert weights. For news articles, the results for using initial expert weights (e.g. MEP-ACoordL- Grad-Conf-All-2) are competitive by yielding the highest overall recall of all methods, while achieving adequate results for precision.

7.4 Discussion

We first deal with ourEP approaches and then interpret the general behavior of meta depen- dencies, as observable throughout all learning methods. We subsequently discuss the perfor- mances of our MEP techniques, which include coordination.

7.4.1 (EP-)OnGD

For our online learning approach based onGD, we distinguished among OnGD-Local with single-expert meta dependencies and OnGD-All with all meta dependencies. OnGD-All did yield superior performances for expert accuracy estimation on both data sets. The latter can be explained by the ability to learn a joint model over all experts while not having to account for missing features caused by expert budgets. The superior performance of OnGD-All for accu- racy estimation is partially based on the performances of the individual experts. Both ST and FOX are relatively precise, but leave out several true positives (see Table 7.3), which makes

112 diinltxulkres oeigmr eiincniae,mgtsbtnilyimprove influential, substantially most might were to candidates, kernels adaptation decision textual well, used more scale the covering and As the kernels, extensible both easily textual straightforward. As are is additional learners domains power. online or predictive the tasks the text as new improve state well they for as heuristics, kernels model are Although PSL thus characters dependencies estimates. -extra meta well-balanced and The provide length weights). to high other received each rules either complement negative rules de- while positive or weights), meta (i.e. weights high effect expert low opposite received the received intra-step rules have positive dependencies pairwise meta as (i.e. expert inter-step one well pairwise towards as weights single- drive both that pendencies show meta-weights learned The Dependencies Meta of Results Qualitative 7.4.4 for experts of performances as the training generalizes for better states eventually of it states. mini-batches residual prediction, available recall. exploits for all and approach pools dominating precision SRL as thereby the for well As F1-measure, results overall good average. on achieving best opti- approaches by the outcome positives yields for true apparent thus disclose EP-OnEWH-All became EP-BatchPSL-All and to As EP-OnGD-All learned learners articles. also online news – than and other tweets both – as for EP-BatchPSL-All competitive well accu- mization, remained as for and NED performances baselines and all resulting over NER The improvements yield experts. did is non-queried estimation and with racy dependencies meta deal of to nature flexible relational sufficiently the exploits (BatchPSL-All) approach SRL Our (EP-)BatchPSL-All 7.4.3 bias, a incorporated thus correlation. weights expert expert to due combined be The might which average. ac- (EP- on expert learner individual dominant relational the were although the curacies worse to still slightly compared performed OnGD-All Still, EP-OnEWH-All EP- BatchPSL-All), While EP-Loc-Avg). weighted different. and as (EP-Gl-Avg well were cases. as votes optimization non-weighted- most majority the outcome over Even for improvements for considerable OnEWH-All results yield tweets. did the and for OnEWH-All NER, NER results for for better OnEWH-All well reach dominated performed did especially stable it so, show where did extendedmore results cases, algorithm estimation EWH all accuracy Its the for information. on improvements partial based with is deal OnEWH-All to measures learner with online our OnGD, than Other (EP-)OnEWH-All 7.4.2 to respect with For average precision. positives. high only true relatively find is with to recall, EP-WM-GD-All harder and of significantly F1-measure is performance it the imprecise, optimization, is SPS outcome As assess. to easier them . Discussion 7.4 113

7 Chapter 7 Evaluation of Learning with EPs & MEPs the predictions. Our approaches thus also enable to exploit numerous embedded text repre- sentations at once. Finally, our results suggest that meta dependencies learned on tweets also improve learning expert weights for news articles, making knowledge transfer possible. Until now, we only dealt withEP results based on expert weight learning methods for SAS. Based on the latter, we now discuss the results of our passive- as well as active coordination techniques.

7.4.5 PCoord

In passive coordination, we used the EWH algorithm to adapt expert weights according to the joint performance of respective agents, without using information about the action taken by other agents. Only MEP-PCoord-OnGD-All, our approach usingGD with all meta de- pendencies, was able to generate considerable improvements with respect to EP-OnGD-All. The technique is thus not sufficiently expressive to deal with the non-weighted case (MEP- PCoord-Free). However, when initial expert weights are available, the method can stabilize the dececentralized majority voting of agents.

7.4.6 ACoordH

Other than in passive coordination, our active learning heuristic used agent meta dependencies to gradually align agent weights throughout the ECP. Our results suggested that active coor- dination can significantly improve both non-weighted as well as weighted majority votings, where in the weighted case (MEP-ACoordH-OnGD-All) the best precision- and F1-measure scores are reached for tweets and news articles. As using active coordination with respec- tively adapted meta dependencies dominates passive coordination, our results suggest that the latter are adequate for the coordination task. The outcomes for recall were successfully op- timized for news articles, but our method was not able to achieve the same goal for tweets. As the approach is a heuristic, it cannot dynamically adapt its weight updates for states with heterogeneous characteristics, which is potentially more important for tweets, as the latter are generally shorter and contain larger degrees of colloquial languages. Finally, the choice of coordination step limit T did not significantly impact our results if T > 5 was chosen. In most cases, if no consensus was found after 5 coordination steps, the results remained the same.

7.4.7 ACoordL

In our learning approach for active coordination, we use a Policy-basedRL approach to coor- dinate expert weights in the ECP. Available initial expert weights are constrained to uniform weights (as used in MEP-ACoordH-Free) as well as using available expert service confidences (not available for several experts, namely AGDT, AIDT,ST and FOX). The evaluation thus emphasizes the performances of non-weighted or partially weighted coordination techniques

114 oriainses( steps coordination derive { to weights initial adapted voting. majority slightly to weighted exploit able the also better for weights thus can agent is and adequate learner weights coordination initial available active all uniform for The adapt recall highest evaluation. the the achieve to throughout able approaches was approach it the coordination coordination, while active decentralized that, the with Note (MEP-ACoordL-Grad- with deals (MEP-ACoordH-OnGD-All). competes approach weights and expert coordination GD articles with the news heuristic for but latter metrics, the dominates performance Conf-All-2) all of terms in here. it with deal not do we performance, (MEP-ACoordL-Single- gra- average baseline without provides as instantiation and used learning only single is the boosting As dient MEP-ACoordH-OnGD-All). to contrast (in o-oiac eaiain( penalization non-dominance x apnsfo gnst xet eut naleprsbigqeid h rcia complexity practical the chosen. of queried, be choice budget being the expert experts While all by in constrained versa. as The results is vice correlation experts or agents. to expert recall of agents low namely from number with mappings tasks, the precision as multi-step high experiments well yielding for our as as problems for well steps central choices NED exhibit design and experts Further NER selected for one functions. experts processing, kernel available image other in the the develop e.g. include similarities), tasks, or vector other use embedded For kernels. to or tasks. used needs NLP text- the other raw to are transferable experiments (e.g. are our measures approaches for straightforward parameters) chose than we (other As choice design central Our Choices Design 7.4.8 research. theoretical and empirical more to for help questions might up findings open Those and weights. results expert the initial reproduce other test values potentially larger (iv) test and (i) penalization to joint suggest stable we reaching tuning, given, parameter for were to confidences respect expert With no faster. coordination If converged af- all actions increased generalize. and to over available started were actions models confidences the joint expert initial ter stable if episodes reaching early that for decreased observation steps empirical the made also Al3i oe itnet h aoiyvtn ( voting majority the to distance lower a is -All-3 Free,Conf naeae h etrslswr ece yMEP-ACoordL-Grad- by reached were results best the average, On tweets for configuration all outperforms still EP-BatchPSL-All approach centralized The T i)ke distance keep (ii) } hr h eta ifrnet MEP-ACoordL-Grad- to difference central the where ) T = 4versus 14 α ewe gn egt o,(i)tyotsrne non-dominance stronger out try (iii) low, weights agent between µ C = EXP T 0 . versus 3 = = )adtecnrldfeec oMEP-ACoordL-Grad- to difference central the and 7) ,weeol igeepr-tt ofiuaincan configuration expert-state single a only where 1, µ = α 0 . 5.Wiernigoreprmnswe experiments our running While 05). = 0 . versus 5 x Al with -All) α = x Al1i aigmore having is -All-1 1 . )a ela higher as well as 0) x Al2(with -All-2 x = . Discussion 7.4 { Free,Conf one-to-one x 115 } = )

7 Chapter 7 Evaluation of Learning with EPs & MEPs

7.5 Summary

In this chapter, we evaluated the suitability ofEPs and MEP, and their respectiveRL- and coordination approaches for NERD. The results for our proposed learning methods forEPs and MEPs showed promising performances in terms of accuracy estimation and outcome op- timization for NERD on news articles and tweets. While our SRL approach based on PSL (EP-BatchPSL-All) reached the best F1-measure for NERD outcome optimization on tweets, our coordination approach for MEPs (MEP-ACoordH-OnGD-All) did outperform all others with regards to F1-measure on news articles. Similar conditions hold for precision and recall for both data sets, where different proposed learning approaches dominate. The differences with regards to the performances are balanced with differences in terms of computational complexity. We presented lightweight approaches based on the EWH algorithm (e.g. MEP- ACoordH-OnGD-All) with low complexity and – on average – adequate performance, but also developed richer models (e.g. EP-BatchPSL-All) with higher complexities and superior performance. As a consequence, all methods have certain advantages and have to be chosen based on the individual task, available data and potentially available end user preferences (e.g. precision versus recall or available resources for computation). Our NERD evaluations for OnlineRL and BatchRL for correspondingEPs answered Re- search Question1 & Research Question2, where the corresponding Hypothesis1&2 can be confirmed. Our NERD evaluations for a passive coordination approach as well as two active coordina- tion approaches for MEPs answered Research Question3, where the corresponding Hypothe- sis3 can be confirmed.

116 Part III

Web Automation

This part deals with the problem of Web automation for multi-step tasks with expert ad- vice. We first extendEPs with semantic representation, yielding SEPs, to approach resolving heterogeneity of experts and data, and subsequently introduce suitable Web components and their interactions for Web automation (Chapter8). To evaluate the proposed Web components, we apply the latter to two medical assistance- as well as one NLP scenario in Chapter9.

rlsesa ne nertn n lgigeprswt vrapn ucinlt sthus is functionality overlapping with experts aligning and Integrating once. functionalities at expert steps interfaces, some eral to where addition granularity, in In of vary functionalities difficult. might discovery and expert names automatic of renders terms in usually experts individual or of hyperparameters, configuration exposed with processors image simulation component-based organ of variety with high data a patient-related offer of processing the physicians becoming automating support Art are by to -composition work potential single and daily great for their reuse offer pay with component components as Web but users such Resulting focus end fields primary relevant. have sensitive of increasingly in and not perceivable are languages, also aspects programming is etary trend of This range earn to calls. wide as expert a well as via or accessible experts HTTP reliably expose via and universally platforms them These make to money. order in experts their lish as such platforms enabling Service Introduction 8.1 in presented scenarios references the Our for This them evaluating as them. well 2. and as combine Chapter hypotheses, instantiating and and with questions deal 5 , query research will Hypothesis prior 9 find, Chapter to to to approaches respect conceptual approach with the data-driven introduces and chapter ap- and 4), we decision-theoretic (Hypothesis with experts LAPIs where a lifting of , 5 then use by terms heterogeneity & and approach in We 4 -interfaces, semantics automation. Question Web lightweight and of Research -implementations problem with the functionalities, deals with deal expert It of Web. heterogeneity the proach on tasks multi-step solve chapter This Processes Expert Semantic Automating 8 Chapter ,svrlpolm mre xoe nefcsof interfaces Exposed emerge. problems several s, API such with tasks multi-step solve To xet.Eape include Examples experts. faeoka ela e rhtcuet automatically to architecture Web a as well as framework SEP the proposes 101–103]. [51, are chapter this for HiFlow ae the eases [127] (MSML) language simulation markup medical the where , 3 iuain naXLbsdformat. XML-based a in simulations [4] mg processing image Algorithmia xetservices expert APIs RESTful nraetemtvto o eeoest pub- to developers for motivation the increase MITK as such toolkits prominent where , fe the offer hc a edrcl accessed directly be can which bundled nu parameters input medicine ucinlt fsev- of functionality hr mon- where , State-of-the- which , 119

8 Chapter 8 Automating Semantic Expert Processes necessary and difficult. Finally, composed expert pipelines potentially generate faulty solu- tions for several problem instances, as they usually are static (i.e. the same experts are always used) even though feedback (e.g. training sets) about the respective mistakes is available. By tackling these challenges, we work towards autonomous Web architectures which are able to find, query and weight eligible experts for multi-step tasks, and improve their perfor- mance (in terms of correct solutions) over time by exploiting feedback. We build on prior work in the Semantic Web and propose the SEP framework, which extendsEPs with semantic representations for states, actions and experts. By making experts available as LAPIs (i.e. semantically enriched Web services; see Definition 3.50) and enabling their data-driven exe- cution with Linked Data-Fu (LD-Fu)[123], they can be easily discovered and used for com- position. The resulting set of semantic experts is weighted and composed by semantic meta components, which are semantically lifted Planning and Learning approaches for sequential decision problems. We evaluate our work on problem settings in medical assistance- as well as NLP (Chap- ter9), where data sources, such as patient-related information or text, are heterogeneous as they often consist of diverse data distributions. We introduce semantic meta components for both scenarios, which make use of available structured knowledge and training sets to auto- matically select, weight and execute experts.

8.1.1 Challenges

The main challenges (see Section 1.2) for automating the selection, execution and composition of experts are as follows.

• If descriptions for experts are available to end users, the process of understanding the latter is often tedious and difficult, as they often are unstructured and short (Challenge 6).

• Interfaces of implemented experts are diverse, which might cause competing experts for the same step to require different input formats, or experts for two subsequent steps to require glue code to be written (Challenge 7).

• Given that an expert has been selected for a step, its automatic executing is difficult, as mappings from data to input parameters might be complex and should not be hard-coded (Challenge 8).

• To enable flexible use of expert services, descriptions about their functionality (e.g. the service class summarizing the functionality) must not be tailored to a single step. However, if general or no functional descriptions are available, one needs to actively find expert compositions from available input data to a predefined goal (Challenge 9).

120 1.4. Sec in introduced as challenges, the to respect with contributions our summarize now We Contributions 8.1.2 ihcnet fteSmni e.Asmni akcnit fstructured of consists task semantic A Web. Semantic the of concepts with of 3.1) concept Definition the introduce now We Formalization Problem in chapter 8.2 the conclude and 8.5 Section in s SEP discuss We to Web. 8.6. respect the automation Section on individual with tasks the multi-step architecture archi- solve for Web proposed used concepts our are our necessary they of we defines how components 8.4 explains automation and Section components the 8.3. describe Section and in 8.2, tecture Section in SEPs introduce W rsn eatceprsadsmni eacmoet o eiinmkn in decision-making for components meta semantic and experts semantic present We (iii) a well as NED lifting by domains Web in NLP for SEPs for challenges novel disclose We (iv) n icoecnrlgasfratmtclysligmlise tasks multi-step solving automatically for goals central disclose and SEPs formalize We (ii) W vlaesmni xet swl ssmni eacmoet ihrsetto respect with components meta semantic as well as experts semantic evaluate We (v) h eane fti hpe ssrcue sflos efraiesmni ak and tasks semantic formalize We follows. as structured is chapter this of remainder The W xli eatcWbtcnlge olf xetdsrpin ihsmni an- semantic with descriptions expert lift to technologies Web Semantic exploit We (i) We hr r utpeeprsaalbefrasnl tp n ed oda ihcon- with deal to needs one step, single a for available experts multiple are there When • o hstei,i eil e architecture Web flexible a in thesis, this of II ( Part in approached as hypotheses, flicting ere,wihi bet obn h rdcin fsmni xet ( experts semantic semantic of predictions grounded abstract the a combine an develop to Recognition, able present Phase is we which Surgical learner, For solve scenario, automatically task. to TPM TPM enable which semantic the planner, the semantic For grounded a and ). 9.2 planner semantic Section (see scenarios medical easily is experts semantic of annotations set semantic resulting lightweight ( using The while executable, LAPIs. dynamically and as discoverable them publish and notations edvlpanvlgone eatcpanrt elwt h viaiiyo numerous of availability ( the NERD with specifically, for deal candidates More to planner decision components. semantic grounded meta novel semantic a develop medical we extending and experts NER as ( Web the on ieefiinya ela h blt ocretyslesmni ak ( tasks semantic solve correctly to ability the 5.4 as well as time-efficiency hleg 10 Challenge otiuin4 Contribution ). otiuin5.1 Contribution ). ). eatctasks semantic ). otiuin5.3 Contribution hc xedordfiiino ak (see tasks of definition our extend which , ). . rbe Formalization Problem 8.2 otiuin5.2 Contribution Contribution 121 ).

8 Chapter 8 Automating Semantic Expert Processes state-, action- and expert spaces, where a structured space can either be abstract or grounded. A abstract structured space is modeled as BGP (see Definition 3.48), i.e. a RDF graph with variables. BGPs express various degrees of knowledge about the respective spaces and are used to model generalizable patterns. A grounded structured space, on the other hand, is modeled as RDF graph (i.e. no variables are allowed). Definition 8.1 (Semantic Tasks). A (grounded) semantic task extends a task with structured spaces, i.e. is a 3-tuple (XRDF,BRDF,YBGP).

• Let XRDF be the grounded structured input space with x ∈ XRDF a RDF graph.

• Let BRDF be the grounded structured background knowledge space, i.e. a RDF graph.

• Let YBGP be the abstract structured output space with y ∈ YBGP a BGP.

The structure of multi-step tasks remains equal, but input, background knowledge and out- put spaces are assumed to be structured. We extend the formalization of tasks with grounded and abstract elements to account for domain knowledge available before a task is solved. The textual information of NLP tasks is, for example, still available for grounded states and might be transformed to features used for learning expert weights. While input- and background knowledge spaces are grounded, the goal space is considered abstract. The intuition is that no resource exists for the given task, unless it was successfully solved or expert hypotheses are available. Multi-step semantic tasks follow Definition 3.1, where solution functions are lifted analogously to semantic tasks. To this end, we are interested in three related outcomes: (i) Finding possible sequences of experts to solve a given task, (ii) querying experts based on the current state and (iii) integrat- ing expert weight learners to choose better expert hypotheses based on training sets. We argue that semantic tasks can be implemented and solved as semantic liftings ofEPs, namely SEPs. For outcome (i) we define Abstract SEPs, where experts are not queried and, thus, grounded states are not derived, but an abstract plan has to be found, comprising expert sequences which solve the semantic task. For outcomes (ii) and (iii), we define Grounded SEPs, where experts have to be chosen, queried and weighted. With Abstract SEPs, our goal is to develop and integrate appropriate Planning techniques to discover and compose experts for a semantic task. Abstract SEPs are defined on BGP spaces, as experts are not executed but comprise generalizable structured descriptions. Definition 8.2 (Abstract Semantic Expert Processes). An Abstract SEP extends anEP and is a 8-tuple (SBGP,ABGP,γ,EBGP,RBGP,TBGP,H,CEXP).

• Let SBGP be the abstract structured state space, where s ∈ SBGP is a BGP.

1 • Let ΓBGP is the abstract structured starting state distribution, i.e. ΓBGP ∈ ∆(SBGP).

• Let ABGP be the abstract structured action space, where a ∈ ABGP is a BGP.

122 Y tatsrcue pcsadfntos n ontepiil een h atri hschapter. this in latter the ab- redefine for explicitly functions distributions not do resulting state-value and and The functions, weights and expert spaces functions, structured value stract to to respect lifted with properties are spaces are All functions transition and tasks. reward multi-step and accordingly. tasks, for defined semantic advice to expert similar optimize representations, structured to learning of lenges SEP. NERD Abstract an to task NERD semantic the translating with 8.1 Example SEP. NERD i.e. instances, of number generalized a for space State task semantic abstract the S spaces describe to RDF latter) on the based BGPs over space knowledge joint as defined is ( process, decision the policy of states, abstract reward finding cumulative abstract is expected for SEP a that Abstract represent Note an they for as ). 5.4 goal expressive, Definition less (see are states weights structured experts expert abstract to respect future with in properties same actions the assume and 5.3) & weights 5.2 Definition (see EPs on based X BGP BGP 1 RDF wt eatcrpeettos easm h aepooo and protocol same the assume we representations, semantic with EP an extends SEP a As chal- and problems central captures it as 5) Chapter (see formalization EP the extend We ts oa Abstract an to task NERD semantic the from transition the for example an give now We tasks multi-step semantic from elements map we tasks, multi-step non-semantic for As Lttasto ucinT function transition Let • E Let • Let • LthrznH horizon Let • ts a h olwn lmns X elements: following the has task NERD semantic The • . = , inbsdo nasrc tutrdsae ahepr sasge egtfunction weight a assigned is expert Each state. structured abstract w an on based tion S A o h e) hr emgtmdltersetv uhr h eu rpafr,or platform, or venue the author, respective the model might we where Web), the (on B BGP BGP e X w RDF BGP γ BGP e BGP ∈ B S → BGP → ˆ : , BGP RDF H Y SBGP [ ..te aet eotmzdi em ftecretrwr n aia future maximal and reward current the of terms in optimized be to have they i.e. , ∪ Smni ut-tpTs oAsrc eatcEpr Process) Expert Semantic Abstract to Task Multi-Step (Semantic 0 BGP + A R , 1 B 1 BGP ˆ eteasrc tutrdepr pc,weee where space, expert structured abstract the be saecnan l eesr nomto adonly (and information necessary all contains space BGP resulting the where , RDF hncmrssteotu pc ftesmni ut-tpts,i.e. task, multi-step semantic the of space output the comprises then ] edfie o btatsrcue tt-adato spaces. action and state- structured abstract for defined be ) EPs. for defined as factor, discount the be .Hr,tesaesaefrtefis horizon, first the for space state the Here, SEPs. Abstract to directly eerdt sepr ucin hc eun nasrc tutrdac- structured abstract an returns which function, expert as to referred , → ∈ ∪ N R V . + hr space where , V n xetbde C budget expert and BGP BGP : S : BGP S BGP → V × R 3.48). Definition (see variables used the contains A BGP and EXP → Q π ∈ BGP BGP S R N BGP BGP CUM + : class : S S EPs. for as defined be n eadfnto R function reward and BGP BGP 5.7). Definition (see X RDF fgone tts h overall The states. grounded of RDF × → A steiptsaeoe texts over space input the is . rbe Formalization Problem 8.2 A BGP n usto background of subset and BGP ∈ E → BGP hc aiie the maximizes which R safnto e function a is r hsdefined thus are . BGP efis deal first We : S S BGP H BGP S + BGP 1 123 1 × = : ,

8 Chapter 8 Automating Semantic Expert Processes

the time of posting. YRDF is the output space over mappings from texts to named enti- ties as well as knowledge graph resources, where one might include annotations about the available texts (as for XRDF) or resources. Finally, BˆRDF defines the background knowledge space, containing structured training samples of (XRDF,YRDF).

• The translation to Abstract NERD SEP is straightforward as available background knowledge is not required. The structured state space thus only comprises informa- tion about the texts. Here, the resulting BGP state spaces might enforce the availability 1 of of a text or the author. As a consequence, we translate SBGP from XRDF by using nec- essary predicates and objects (such as classes) to model the abstract structure in terms H+1 of BGPs. We follow the same procedure for SRDF based on YRDF. Based on the semantic NERD task transition, we shortly discuss the difference to the se- mantic TPM task.

• The elements of the semantic TPM task can be directly mapped from NLP to image pro- cessing. Only BˆRDF, the background knowledge space, might comprise (besides samples as for NERD) learned mask models for image segmentation, which can be directly used for inference.

• The translation from the semantic TPM task to the respective Abstract TPM SEP gen- erally only differs in using background knowledge for state spaces. The abstract state 1 space for the first horizon, SBGP, needs to comprise necessary mask models such that available experts can exploit the latter.

Given a semantic task and an Abstract SEP, the problem of Abstract Semantic Planning entails only using available structured information of experts to find policy πBGP (see Def- inition 3.43). Other than inRL, we do not need to learn or interact with the world, as the structured spaces allow us to enumerate all possible paths and rewards can be modeled to be exclusively positive for reaching the respective output spaces.

Definition 8.3 (Abstract Semantic Planning). Assuming a learned Abstract SEP model, i.e. learned or given RBGP and TBGP, or learned or given expert weight functions weBGP , we need 1 to find policy πBGP to generate a semantic plan with start state sBGP, which is a sequence 1 1 2 2 H+1 sBGP,aBGP,sBGP,aBGP,...,aBGP . Given that an abstract plan has been generated and environmental feedback for expert queries is available, we can approach defining and solving Grounded SEPs. Other than in the abstract case, we need to query experts based on grounded RDF spaces and use expert weighting methods for selection.

Definition 8.4 (Grounded Semantic Expert Processes). A Grounded SEP extends anEP and is a 8-tuple (SRDF,ARDF,γ,ERDF,RRDF,TRDF,H,CEXP).

124 .Termiigcalnei hst nert appropriate integrate to thus SEPs. is Grounded challenge into remaining techniques The RL s. SEP in states of elements structured R reward cumulative expected the maximize to order in erigetisetmtn xetwih ucin w functions weight expert estimating entails learning 8.5 Definition define. now we which EPs, otisalmdldifraint ecietegone eatcts o nisac,i.e. instance, an for i.e. task task, semantic grounded the describe S to information horizon, modeled first all the contains for space state space The RDF SEPs. Abstract to analogous hc aiie h xetdcmltv eado h eiinprocess, decision Defini- the (see of 5.7). states reward policytion cumulative grounded structured finding expected is grounded the SEP future maximizes Grounded a which for for goal value overall The maximal 5.4). and tion reward current weights the expert of to respect with properties same egt n eutn itiuin o btatsrcue pcsadfntos n onot functions do state-value The and expert functions, chapter. functions, and this value Q in spaces to latter structured respect the abstract with redefine for properties explicitly distributions and resulting protocol and same weights the assume again We RDF 1 BGP t er oiybsdo h un- the on based policy a learn to 6 & 5 Chapter in proposed techniques RL use can We h rbe of problem The apn lmnsfo eatcmlise tasks multi-step semantic from elements Mapping ihgone tutrdsaespaces. state structured grounded with s EP extends SEPs Grounded of formalization The LtE Let • LtA Let • Lttasto ucinT function transition Let • Let • Let • S Let • LthrznH horizon Let • = : ae nagone tutrdsae ahepr sasge egtfnto w function weight a assigned is expert Each S state. structured grounded a on based S A S RDF RDF X → BGP S RDF γ Γ RDF H R RDF RDF → RDF X + ∈ RDF → × . RDF 1 ∪ [ A A 0 = R B Gone eatcLearning) Semantic (Grounded etegone tutrdsaesae hr s where space, state structured grounded the be ˆ stegone tutrdsatn tt itiuin i.e. distribution, state starting structured grounded the is etegone tutrdato pc,weea where space, action structured grounded the be BGP , RDF etegone tutrdepr pc,weee where space, expert structured grounded the be RDF 1 Y n usto akrudknowledge background of subset and edfie o ruddsrcue tt-adato spaces. action and state- structured grounded for defined be ] RDF ruddSmni Learning Semantic Grounded EPs. for defined as factor, discount the be eerdt sepr ucin hc eun ruddsrcue action structured grounded a returns which function, expert as to referred , → tt space State . ∈ . R N + n suethe assume and 5.3) & 5.2 Definition (see EPs on based extended are n xetbde C budget expert and RDF S RDF H : S + RDF 1 sbsdo h uptsaeo h eatcmulti-step semantic the of space output the on based is × A RDF . EXP w ruddsemantic grounded SEP, Grounded a Given e → RDF steaaoyt xetwih enn in leaning weight expert to analogy the is ∈ S e ..te aet eotmzdi terms in optimized be to have they i.e. , ( RDF N RDF X RDF CUM RDF B + ˆ o l eatceprse experts semantic all for RDF EPs. for as defined be n eadfnto R function reward and , . B space RDF resulting the where , RDF ∈ , ∈ . rbe Formalization Problem 8.2 S Y RDF BGP S A ∈ RDF 1 RDF E ) V graph. RDF a is RDF sdfie ae on based defined is , RDF is SEPs Grounded to π Γ graph. RDF a is RDF RDF R RDF CUM : safnto e function a is S : ∈ RDF S RDF RDF ∆ seDefini- (see RDF ( S → : RDF 1 → ∈ S R RDF A E e ) RDF 125 . RDF RDF and × : :

8 Chapter 8 Automating Semantic Expert Processes

Figure 8.1: Schematic overview of automation components.

For a given Grounded SEP, the problem of finding a grounded plan (i.e. policy πRDF) is referred to as Grounded Semantic Planning and entails to enable correct expert queries for a given state.

Definition 8.6 (Grounded Semantic Planning). Given a Grounded SEP, a grounded semantic 1 1 1 2 2 H+1 plan with start state sRDF is a sequence sRDF,aRDF,sRDF,aRDF,...,sRDF , where we have to find h h 1 1 a respective state sˆRDF for expert eRDF such that aRDF = eRDF(sˆRDF).

We now introduce the different components we disclosed to tackle prior defined learning and planning problems in both Abstract- and Grounded SEPs within a Web-based architecture.

8.3 Web Components for Automation

Our Web architecture comprises four core automation component types (see Figure 8.4). We will now explain their respective functionality.

1. A Structured Knowledge Base

2. Semantic Experts

3. Semantic Meta Components

4. A Data-Driven Execution Engine (Semantic Expert Advice Agent)

126 rqet.Ohrnnfntoa eurmnscomprise requirements non-functional Other experts domain requests. HTTP issuing for used resource service The expert. the semantic of information of set minimal A 8.1. Table methods. communicate in its to summarized execute how is to defines description how description and The expert KB. semantic the the as of with concepts experts reusing resulting by origin the and to usage refer and 3.50) Definition (see accessible. LAPIs easily experts of them semantic make concept to services the Web follow as deployed We are experts architecture, Web our In Experts Semantic 8.3.2 via and reached action- be can state-, or abstract KB and the grounded in of either triples are all spaces expert Hence, generate experts. to semantic or of components execution semantic-meta and data components. experts, novel experts tate of for integration the be concepts enable to Appropriate the to available have As be resources lookups. to RDF. to have for data with URIs information and persistent modeled sufficient 3.4.5 ), vocabulary, provide Section controlled and metadata (see available and both suggest stores common principles triples, of a Data collection Linked provides integrated virtually and a or data, store and triple a as such KB, A Base Knowledge Structured 8.3.1 eecuieyrqiesrcue noain about puts annotations structured require exclusively We their in users end ery. support to postconditions the and or preconditions usage) of groundings exemplary (i.e. ttsbfr-adatreeuigtesmni xet sacneune ed s h ex- the use solve do to we functionality) available consequence, the a about annotations As structured further expert. generate semantic (or the class pert executing after and before- states Here, a provides expert semantic A tu setal opie oee rrue noois hc r sdt anno- to used are which ontologies, reused or modeled comprises essentially thus KB The and ucinlrequirements Functional non-functional postconditions xetclass expert xetclass Expert request Example Endpoint Service Experts Domain requirements Non-functional ie eeoeso h xetfunctionality), expert the of developers (i.e. . Mnmldsrpinfrsmni experts. semantic for description Minimal 8.1: Table evc endpoint service rcniin n otodtos ee en titrlsaotthe about rules strict define here, postconditions, and Preconditions . hc sstae natxnm o eea eatcepr discov- expert semantic general for taxonomy a in situated is which , eurmnsdfiemtdt hc r o eddfruigthe using for needed not are which metadata define requirements elwt eurdcniin for conditions required with deal response & tnadzddescription standardized eoe necpint hsrl,a tcntttsthe constitutes it as rule, this to exception an denotes Postconditions Outputs Preconditions Inputs requirements Functional inputs . e opnnsfrAutomation for Components Web 8.3 iktraversal link and ihrsett t functionality, its to respect with xml requests example preconditions rvnnemetadata provenance using . h eatcexpert. semantic the and swl as well as -responses o the for out- 127

8 Chapter 8 Automating Semantic Expert Processes

SEPs, but plan and learn to infer the adequacy of a semantic expert. We now generally define semantic experts (see Definition 8.7).

Definition 8.7 (Semantic Expert). A semantic expert is a 5-tuple (Preconditionse,Postconditionse,Descriptione,eBGP,eRDF).

• Let Preconditionse ∈ SBGP be the precondition BGP.

• Let Postconditionse ∈ SBGP be the postcondition BGP.

• Let Descriptione be the RDF graph with all functional and non-functional require- ments.

• Let eBGP the abstract expert function which evaluates Preconditionse on a grounded state and generates an abstract resulting state without querying the expert.

• Let eRDF the grounded expert function which evaluates Preconditionse on a grounded state and generates a grounded action by querying the expert.

An intuitive example of an arbitrary image processor is given in Figure 8.2. The semantic expert converts images in PNG format to JPEG. It requires (via its preconditions) that in- puts have to be typed with kbont:Image with format kbont:PNG, while outputs need to have format kbont:JPEG. We use the rdf:type predicate to model the expert class and model predicates kbont:contributor and kbont:exampleRequest for annotating domain experts and example requests. Note that we use kb and kbont as generic names- paces to describe instances and conceptual elements (i.e. properties or classes) available in the knowledge base 1. The MSM namespace 2 corresponds to the minimal service model [73] which advocates and enables lightweight semantics for Web services. Listing 8.1 summarizes the used namespaces for this chapter. 1 @prefix kbont:. 2 @prefix kb:. 3 @prefix msm:. 4 @prefix rdf:. 5 @prefix dc:. Listing 8.1: (Generic-) namespaces used forKB concepts and instances.

Further note that the degree of detail for preconditions and postconditions is often dependent on the wrapping process of the respective expert, as hyperparameters or generated metadata might be hidden (and thus not exposed). The latter is usually dependent on the intended range of applications for the semantic expert. Larger numbers of hyperparameters and metadata might foster reusability of the semantic expert, but also complicate its usage. The same holds

1We will point out which exact ontologies we used in the respective scenarios. 2http://iserve.kmi.open.ac.uk/ns/msm/msm-2014-09-03.html (accessed on 05/01/2018)

128 ut ntann es rals fapial ttsfrteuaeo eatcmt component. meta semantic a of usage the for states re- applicable validated of of list history a a or sets, containing training tables on the performance sults if are called preconditions only such is for and Examples preconditions its by requires it tion or planning semantic grounded planning, learning. semantic semantic abstract grounded for strategy a implements which i xet i.e. expert, tic 8.8 Definition LAPIs. for as availability annotations their structured reasonable, requiring if by and, components using, usage meta flexible their semantic enables architecture of Web exchanging Our and learning. testing semantic grounded or planning mantic components meta Semantic Components Meta Semantic 8.3.3 Planning for efforts more postcon- require and but applicability, outputs increase integration. preconditions, complexity inputs, of for degrees classes higher and where ditions, properties modeled or used the for iia oasmni xet eatcmt opnn pcfisteaon finforma- of amount the specifies component meta semantic a expert, semantic a to Similar enwda ihsmni eacmoet,wihintegrate which components, meta semantic with deal now We ehd noteWbatmto architecture. automation Web the into methods Smni eaComponents) Meta (Semantic A xmlr eatciaecneso expert. conversion image semantic exemplary An 8.2: Figure 5 -tuple ( Preconditions r ouin oasrc eatcpann,gone se- grounded planning, semantic abstract to solutions are SEPs for e , Postconditions . eatcmt opnn saseman- a is component meta semantic A . e opnnsfrAutomation for Components Web 8.3 agent e , Description a rvd l information. all provide can Learning and II) Part (see e , e BGP , e RDF 129 ) ,

8 Chapter 8 Automating Semantic Expert Processes

8.3.4 Semantic Expert Advice Agent

Until now, we described required representations and implementations of semantic experts, semantic meta components, as well as theKB. To this end, we need a flexible mechanism for discovering and executing respective LAPIs. Our goal for Web automation is to compose semantic experts in order to automatically solve semantic tasks. Resulting compositions have to be sufficiently dynamic with respect to available inputs for the semantic task such that the best set of experts is correctly queried. We take a data-driven perspective and have preconditions of semantic experts or semantic meta components determine when the respective LAPI has to be executed. To this end, we exploit LD-Fu, a data-driven execution engine which is used as baseline for composing semantic experts.

Linked Data-Fu

LD-Fu[123] is a rule-based execution engine, describing and implementing a formalism to virtually integrate data from different sources in real-time. Rules are defined in N3 syntax (see Section 3.4.3) and consist of a general IF-THEN structure, where a set of rules is referred to as linked program. The IF condition of a rule (also referred to as head) has to comprise a BGP or be empty, expressing the starting point of a linked program. The THEN clause then defines the action to be conducted given the head is fulfilled. Our focus, here, lies on the ability of LD-Fu to use HTTP methods which enables us to dynamically execute available LAPIs. LD-Fu fosters scalability of Web applications by enabling parallel queries of LAPIs. It re- lies on polling data sources to query LAPIs by a linked program, which is a contrary approach to event-based methods where triggers originate from the environment.

Linked Data-Fu for Semantic Experts

In our Web automation architecture, LD-Fu first searches eligible semantic experts for grounded states. By exploiting semantic meta components, we can then learn about semantic experts and plan correct semantic expert executions to solve semantic tasks. Each semantic expert is represented as single rule, which we automatically generate based on its description and current grounded state. The execution of a semantic expert is documented by publishing provenance as Linked Data. See Listing 8.2 for the exemplary generated provenance of the semantic image conversion expert.

130 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 for Agent vice ol uoaial ov e eatctsswtotadtoa aulefr,i semantic we if experts, effort, semantic manual diverse additional of without number tasks match. rules growing semantic descriptions defining a new simply with solve by important, automatically uses possible more automatically could generally agent Even is the This sets, LD-Fu. execution. training for for with data trained training be annotated to all have experts in semantic with If deal geous. we which preconditions complex are there seman- cases, executing 9.3.2. use automatically Section enables numerous this for While experts HTTP URI. service tic a its expert, to semantic issued the is preconditions of preconditions the fulfills state egn ntnitdi u e rhtcueas architecture Web our in instantiated engine LD-Fu the to refer We advanta- highly is data structured and experts semantic between matching automatic The grounded a When expert. semantic simple a for process general this summarizes 8.3 Figure kbont:GroundedPostcondition; rdf:type kbont:GroundedPrecondition; _:o rdf:type _:i kbont:SemanticExpertExecution; rdf:type kb:execution1 http://purl.org/dcelements/1.1/> < sparql: @prefix xsd: @prefix dc: http://example.org/kbinstancesid> < @prefix rdf: @prefix msm: @prefix http://example.org/kbconcepts#> < kb: @prefix kbont: @prefix Learning i short (in Eepaypoeac o eatciaecneso expert. conversion image semantic for provenance Exemplary 8.2: Listing and Planning eatcagent semantic d:au """{ rdf:value """{ rdf:value "2017-11-29T22:18:00"^^xsd:dateTime; _:o. _:i; kbont:output kbont:input dc:date kbont:expert biae d:yekbont:Image; rdf:type kb:image2 kbont:Image; rdf:type kb:image1 otk eiin o ovn ut-tptsswt eatcexperts. semantic with tasks multi-step solving for decisions take to .Tesmni gn xlissmni eacomponents meta semantic exploits agent semantic The ). cfra kbont:JPEG; kbont:convertedFrom dc:format kbont:PNG. dc:format kb:SemanticImageConversionExpert1; }"""^^sparql:GraphPattern. }"""^^sparql:GraphPattern. kb:image1. . e opnnsfrAutomation for Components Web 8.3 POST eus ihgrounded with request eatcEpr Ad- Expert Semantic 131

8 Chapter 8 Automating Semantic Expert Processes

Figure 8.3:A LD-Fu rule for the semantic image conversion expert.

Definition 8.9 (Semantic Expert Advice Agent). A semantic expert advice agent solves multi- step tasks with available semantic experts by solving Abstract- and Grounded SEPs with ab- stract semantic planners, grounded semantic planners and grounded semantic learners, avail- able as semantic meta components. Semantic experts and semantic meta components can be discovered by their type as well as preconditions and postconditions. However, a good selection of the latter is based on two aspects, namely finding conceptually eligible components or experts and finding a well per- forming components or experts, given the current state. We now deal with the respective problem settings in detail.

8.4 Solving Semantic Tasks with Semantic Meta Components

We now explain the required interplay between semantic meta components to solve abstract- as well as grounded semantic tasks with Abstract SEPs and Grounded SEPs, respectively. We first address abstract semantic planning, where we ignore state- and action groundings, and exclusively exploit descriptions of semantic experts and states. The second case deals with grounded semantic learning and builds on selected semantic experts via abstract semantic planning. As dealt with in PartII of this thesis, we need to learn weights for experts to find the optimal hypothesis. In the grounded semantic planning setting, we need to automatically find expert configurations to solve a multi-step task for a given grounded starting state and abstract goal. In Figure 8.4 we give a generic overview of interactions between the semantic agent and semantic meta components, semantic experts and theKB. The semantic agent first queries the knowledge base to get all available semantic experts and then calls an abstract semantic planner. It executes the resulting set of semantic experts and subsequently calls a grounded semantic learner, capable of assessing and combining the generated candidate outputs. The

132 t rcniin.W ecieoeipeetto fa btatsmni lne nSec- in planner semantic abstract an of implementation one describe We preconditions. its goal the reaching support latter the if checking by query amount we large state. experts An a semantic be of state. set might a the there for reduce as experts state, semantic goal eligible the conceptually reaching of help which experts against a evaluated semantic state generates be trigger therefore stating can agent a which semantic evaluates rules The first of experts. set one semantic of SEP, matching Abstract precondition corresponding terms in and task semantic a For Planners Semantic Abstract 8.4.1 im- be might and experts grounded selected execute to plemented needed is planner semantic grounded ilsrtsteaetrl ocl h btatsmni lne eedn on depending planner semantic abstract the call to rule agent the illustrates 8.4 Figure within eatcaetservice. agent semantic a Itrcin fsmni eacomponents. meta semantic of Interactions 8.4: Figure . ovn eatcTsswt eatcMt Components Meta Semantic with Tasks Semantic Solving 8.4 s BGP 1 ecno sue oee,ta ue only rules that however, assume, cannot We . btatsmni planner semantic abstract sal to able is s 133 BGP 1

8 Chapter 8 Automating Semantic Expert Processes tion 9.2.2 for the TPM scenario.

8.4.2 Grounded Semantic Learners In the grounded semantic learning setting, we want to solve a Grounded SEP by having ac- cess to several exchangeable semantic experts which solve the same step (i.e. we solve the Learning problem; see PartII). Eligible semantic expert candidates are known (e.g. due to an abstract semantic planner), but one has to deal with uncertainty about their expected perfor- mances given a grounded state sRDF. A grounded semantic learner selects semantic experts before their execution by estimating their performance, which is crucial when large numbers of semantic experts are eligible for a Grounded SEP. Figure 8.4 describes the generic rule to execute any available grounded semantic learner. Section 9.2.3 introduces a semantic learner for medical phase recognition which can be reused for other tasks, such as NERD.

8.4.3 Grounded Semantic Planners Other than in abstract semantic planning, grounded semantic planning requires information about the grounded state sRDF of the respective Grounded SEP. Grounded semantic planners are needed when preconditions and postconditions of a LAPI do not suffice to solve a task, as state representation or semantic expert descriptions are too complex. A grounded semantic planner is thus executed to find correct configurations for available input parameters of the chosen semantic experts. However, we might need to repeatedly plan for solving a Grounded SEP, namely potentially after each step is processed. We thus assume that the grounded semantic planner is part of the implementation of the semantic agent. We present a grounded semantic planner for the semantic NERD task in Section 9.3.

8.5 Discussion

We first discuss the added value of the SEP framework and then deal with the more general goal of developing self-adaptive systems.

8.5.1 On the Added Value of SEPs Modeling fine-grained semantic annotations, and developing and integrating meta components adds manual work for domain- and machine learning experts. Still, resulting semantic meta components enable to gradually learn which semantic experts to select given state character- istics and how to automatically compose them. Pipelines are thus dynamically changed for each state (and decision candidate) to maximize the correctness of task solutions, which usu- ally remain static unless manually changed (e.g. or via fine-grained manual preconditions).

134 uecase. use NERD the as compo- well meta as semantic scenarios diverse assistance with after medical revisited architecture the Web be to will the nents and of confirmed evaluations partially and is applications 5 our Hypothesis corresponding the where 5, tion for semantic execute engine eventually execution and components rule-based meta as semantic experts. LD-Fu query can using we By agent, semantic learners. the weight semantic and grounded planners semantic with grounded them with executions configure planners, semantic abstract medical case. the use to NERD experts confirmed the semantic as partially of evaluations well is and as 4 applications scenarios our assistance Hypothesis after ad- corresponding revisited experts the be semantic will where on and 4 , work Question Our Research postconditions. dressed and preconditions modeling for proposed semantic as to refer experts, we semantic which KB, engine, a execution agent. data-driven of a semantic consists and grounded then components A defined meta problems planning, we semantic representations. learning semantic tasks, RDF semantic abstract and grounded semantic resulting and BGP solve solve planning with To to EPs architecture extending graphs. Web SEPs, RDF suitable Grounded be as well to as considered Abstract- are knowledge background spaces input-, towards where output work tasks, semantic to and defined order with- first We queried in and intervention. representations weighted human semantic discovered, out are to experts tasks semantic available and where EP s automation, Web lifted we chapter, this In Summary 8.6 with deal interesting to is own. how it its of on problem while important the Hence, other, is each bias process. respective with training their compete components or meta methodology semantic certain used have a its to has component to meta respect sources semantic each with data However, novel bias experts. improve considering semantic and or automatically self-adaptive sets) by be training achieved to (e.g. [136] architecture partially Web Drissi it the & which want Vilalta We experience, what bias. with is however, infinite This, of as curse well. well as as as planning depict compete for them components have meta could semantic one different learning, among choose to enable System we Self-Adaptive Since an of Goal The On 8.5.2 adequate with experts semantic novel account into taking interfaces. automatically semantic for holds also This u oko h e rhtcuefrteatmto rbe drse eerhQues- Research addressed problem automation the for architecture Web the on work Our via SEP a for experts suitable discover flexibly can we components, meta semantic on Based were annotations semantic lightweight where LAPIs, as defined were experts Semantic . Summary 8.6 135

8

.W ilda ihtejitts fasrc semantic abstract of task joint the be with learning. to deal semantic have grounded will entities and We named planning KBs. where semantic grounded to NERD, planning, linked solve semantic and to adequate text entails compose seman- from scenario and abstract extracted select NLP with to the deal a planning Finally, semantic we of grounded experts. Here, headscans as of well processed. number as automatically planning a on are tic deals where Based patient scenario TPM, tumor second specifically phases. brain The more single surgical learning. processing, to semantic image grounded mapped medical with be with deal to we has intra-surgical, sets, an training information available is sensor recognition where phase task Surgical single-step components. meta semantic different quiring assistance medical basis. data-driven a The on experts experts. semantic semantic resulting execute executes correctly and discover agent as to planners semantic planners semantic solve abstract semantic To and grounded are II, as processing). Part which in well image learners, proposed semantic or methods grounded learning NLP to namely in extensions envi- components, semantic Web-based meta (e.g. in semantic common require tasks very we multi-step are They SEPs, of tasks tasks. automation semantic such solve the where to towards ronments, representations step expert important and action an state-, constitute richer to s EP lift SEPs Introduction 9.1 [114]. in methods proposed a the as of well as SEP LAPIs proposed for references proposed. the liftings Our are semantic of framework where automation evaluations 5 , data-driven and & and 4 decision-theoretic applications It Hypothesis providing confirm approached. of fully is to automation terms framework Web in of 8 goal the Chapter to extends respect with interfaces and implementations chapter This Expert Processes Semantic of Evaluations & Applications 9 Chapter nti hpe,w pl h e uoainfaeokt w ifrn ed,namely fields, different two to framework automation Web the apply we chapter, this In hr h eeoeet ffunctionalities, of heterogeneity the where 5 , & 4 Question Research with deals aae h omncto mn eatcmt opnnsadflexibly and components meta semantic among communication the manages ,weew diinlynt h application the note additionally we where –103 101], [51, are chapter this for and NLP o eia sitne eda ihtoseais ahre- each scenarios, two with deal we assistance, medical For . 137

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Both fields pose distinct challenges for semantic tasks and serve as proof-of-concepts for SEPs. We provide additional evaluations for properties of semantic experts (e.g. time per- formance) and developed semantic meta components (e.g. precision or recall for grounded semantic learners). We summarized the challenges and contributions in Chapter8.

9.2 Medical Assistance Scenarios

We introduce applications of our Web automation framework to two medical assistance sce- narios, namely Surgical Phase Recognition and TPM. Before introducing the respective se- mantic experts and semantic meta components, we present the shared Web automation archi- tecture for medical assistance.

9.2.1 Web Automation for Medical Assistance Tasks Our Web architecture for medical assistance tasks was developed within the Cognition-Guided Surgery project 1. Every expert considered in the scenarios was lifted and published as seman- tic expert. We modeled all semantic descriptions with domain experts and developers of the experts, and integrated them in a shared instance of a Semantic MediaWiki (SMW)[78], re- ferred to as Surgipedia. SMWs extend MediaWikis 2 with structured representations, where a single page corre- sponds to a unique resource. The MediaWiki syntax is extended to deal with RDF, such that triples can be directly edited, added or deleted. The latter is facilitated by using templates and forms, which are intuitive means to enable end users to quickly model and publish RDF triples. A SMW also allows to integrate available ontologies and maps them to locally mod- eled concepts or predicates, which eases reuse for end users. Finally, as SMW pages can be looked up using RESTful content negotiation to return RDF via HTTP GET requests, one can easily fulfill all recommended Linked Data Principles (see Section 3.4.5) by using SMWs to structure data. Figure 9.1 illustrates the developed SMW form for end users to generate semantic descrip- tion for their semantic experts (see Section 8.3.2). Preconditions and postconditions have to be modeled with additional forms, as the latter often corresponds to complex BGPs. To this end, all semantic annotations are based on Medical Subject Headings (MeSH) 3, Radiology Lexicon (RadLex) 4, SNOWMED-CT 5 and the Foundational Model of Anatomy (FMA) 6. We use an instance of XNAT 7 to store patient-relevant data and provide a RDF

1http://www.cognitionguidedsurgery.de/ (accessed on 05/01/2018) 2https://www.mediawiki.org/wiki/MediaWiki (accessed on 05/01/2018) 3https://meshb.nlm.nih.gov (accessed on 05/01/2018) 4https://www.rsna.org/RadLex.aspx (accessed on 05/01/2018) 5http://www.snomed.org/snomed-ct (accessed on 05/01/2018) 6http://si.washington.edu/projects/fma (accessed on 05/01/2018) 7http://www.xnat.org/ (accessed on 05/01/2018)

138 ihnteppln,btmnal rae mg noain r eddfrthe for needed are annotations image created Expert malization manually but pipeline, the within the consequence, a As a notations. by generated mask brain ( expert a and image brain indicated a format. not NRRD If the in outputs. files produced produce and and expect inputs are experts which needed the library with MITK otherwise, together the of 9.1 experts Table six in use diagnostic 2.3), we summarized daily application, their Section our intu- in For radiologists aligned, in support workload. spatially to treatment introduced a order and generate in (as to progression tumor processed steps the are of processing patient overview itive single image a five of headscans of several where consists scenario TPM The Mapping Progression Tumor Semantic 9.2.2 9.2. Figure as considered be as can such KB The concepts. semantic tween with XNAT lifts which wrapper o step For Surgipedia outBanNormalization Brain Robust XNAT ri Normalization Brain TeSWfr oantt xet o eia assistance. medical for experts annotate to form SMW The 9.1: Figure h opeeacietr ihtosmni eacmoet silsrtdin illustrated is components meta semantic two with architecture complete The . Also, . weealotlge aebe nertd n t ik oohrresources, other to links its and integrated) been have ontologies all (where a Generation Map n vial xet( expert available one , tnadBanNormalization Brain Standard eursadtoa fie-rie)mna ri akan- mask brain manual (finer-grained) additional requires ) osntrqiethe require not does ri akGeneration Mask Brain tnadBanNormalization Brain Standard . eia sitneScenarios Assistance Medical 9.2 uo Segmentation Tumor xetcnawy equeried be always can expert xet u second a but expert, outNor- Robust union requires ) xetto expert 139 be-

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Figure 9.2: The automation architecture for medical assistance.

Expert Input Output Brain Mask Generation Headscan (NRRD or MHA) Brain image Atlas image Brain mask Atlas mask Standard Brain Normalization Brain image Normalized brain image Brain mask Robust Brain Normalization Brain image Normalized brain image Minimum brain mask Maximum brain mask Exclude brain mask Batched Folder Registration Normalized brain image Registered brain image Tumor Segmentation Registered brain image Brain segmentation Map Generation Registered brain image Map (PNG) Brain segmentation

Table 9.1: Needed inputs and generated outputs for TPM.

140 13 12 11 10 13 12 11 10 9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 definition. LAPI the by recommended as headscan, input the to link to have specific patient-specific a that specify tions and that and format A in Appendix source in presented as are typed are experts postconditions TPM and for pre- se- presented descriptions for individually semantic postconditions all and that preconditions Note step modeled process. the the generation of discuss TPM experts the shortly mantic in we step illustration, every an for experts As semantic published and developed We Experts TPM Semantic task. TPM semantic the for experts semantic eligible finds format. which NRRD planner, the semantic in available PNG are single files a output aggre- into and patient TPM input same The residual the All latter. of file. the masks without brain tumor-segmented) TPM (possibly a multiple create gates can and segmentations respective generate semantic A abstract our present then and semantics with experts TPM lifted we how illustrate first We banmg d:yespc:BrainImage; spc:Headscan; rdf:type spc:MHA) = ?format ?brainImage || spc:NRRD rdf:type = (?format FILTER . ?headscan dc: . @prefix rdf: @prefix spc: @prefix spp: @prefix spc:BrainAtlasMask; spc:BrainAtlasImage; rdf:type spc:Headscan; ?brainAtlasMask spc:MHA) rdf:type = ?format ?brainAtlasImg || spc:NRRD rdf:type = (?format FILTER ?headscan dc: @prefix rdf: @prefix spc: @prefix ri ta mask atlas brain uptbanmask brain output Peodtoso h eatcBanMs eeainexpert. Generation Mask Brain semantic the of Preconditions 9.1: Listing ri akGeneration Mask Brain spc:NRRD cfra spc:NRRD; ?headscan. spp:headscan dc:format ?format. dc:format spc:NRRD. spc:NRRD. dc:format dc:format ?format. dc:format fe epciesmni xethsbe ure,tepostcondi- the queried, been has expert semantic respective a After . ri akGeneration Mask Brain ftesm omthv ob vial.Bt eeae images generated Both available. be to have format same the of or spc:MHA uptbanimage brain output xetrqie ye ain-pcfi edcnre- headscan patient-specific typed a requires expert n eei,patient-unrelated generic, a and , 9.2. & 9.1 Listing in depicted are which , ihformat with . eia sitneScenarios Assistance Medical 9.2 sparql:GraphPattern spc:NRRD ri ta image atlas brain n patient- a and . 141

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

14 ?brainMask rdf:type spc:BrainMask; 15 dc:format spc:NRRD; 16 spp:headscan ?headscan. Listing 9.2: Postconditions of the semantic Brain Mask Generation expert.

To this end, it is important to discuss a central property of preconditions and postconditions we refer to as multiplicity. The needed inputs and outputs for the Brain Mask Generation ex- pert are unambiguous with respect to the used ontologies, as – for the stipulated preconditions – a single headscan, single atlas image and single atlas mask is needed. However, experts might be required to process an a priori unknown number of inputs. The resulting multiplic- ity problem is available for the Brain Registration expert, which spatially aligns all available normalized brain masks of a single patient. To express this property in preconditions and postconditions, we propose to use the avail- able rdf:Bag class for collections 8, where included resources are unordered. Listing 9.3 shows the resulting preconditions of the step, where a number of normalized brain masks are required as input, which are part of a rdf:Bag. 1 @prefix spp:. 2 @prefix spc:. 3 @prefix rdf:. 4 @prefix rdfs:. 5 @prefix dc:. 6 7 ?norm_bag rdf:type rdf:Bag. 8 9 ?normMask rdf:type ?norm; 10 rdfs:member ?norm_bag; 11 dc:format spc:NRRD; 12 spp:headscan ?headscan; 13 spp:brainMask ?brainMask. 14 FILTER (?norm = spc:NormalizedBrainMask || 15 ?norm = spc:RobustNormalizedBrainMask). 16 17 ?brainMask rdf:type spc:BrainMask; 18 dc:format spc:NRRD; 19 spp:headscan ?headscan. 20 21 ?headscan rdf:type spc:Headscan; 22 dc:format ?format. 23 FILTER (?format = spc:NRRD || ?format = spc:MHA) Listing 9.3: Preconditions of the semantic Brain Registration expert.

To characterize and define the respective bags, we model a single representative re- source of it in preconditions or postconditions. For the semantic Brain Registration ex- pert, we define that the bag requires resources of type spc:NormalizedBrainMask or spc:RobustNormalizedBrainMask, for which property rdfs:member is used to

8https://www.w3.org/TR/rdf-schema/#ch_bag (accessed on 05/01/2018)

142 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 tr)i eerdt as to referred is at store) data structured aggregating of struc- the process to The according files. extended and raw XNAT of as interface Principles well REST as the information from reused structured was available ture to link finally which sources 9.4. Listing in depicted is 8.3.4 . rule Section resulting in the expert, described Generation generally Mask as Brain hold, semantic preconditions the their For if executed only are experts agent property bag use a alternatively where can triple, one that reciprocal Note the model resource. a of membership the define ntemdclassac cnro easm a assume we scenario, assistance medical the to In points . KB which the from data input eprs enwda ihtereeuinb the by execution their with deal now we experts, TPM semantic available on Based nadto orlsfrsmni xet,w edrlst ur n grgt h needed the aggregate and query to rules need we experts, semantic for rules to addition In }. spc:BrainAtlasMask; httpm:POST; http:mthd _:a { => } spc:BrainAtlasImage; rdf:type spc:Headscan; ?brainAtlasMask spc:MHA) rdf:type = ?format ?brainAtlasImg || spc:NRRD rdf:type = (?format FILTER ?headscan http://www.w3org/2011/#> < http: @prefix httpm: @prefix dc: @prefix rdf: @prefix exp: @prefix spc: @prefix { egnrt e frls(..alne rga) hc ttsta epciesemantic respective that states which program), linked a (i.e. rules of set a generate We . }. { http:body exp:BMG; http:requestURI dpcstesto ue o ahrn ain-eae nomto.The information. patient-related gathering for rules of set the depicts 9.5 Listing . bantaMs d:yespc:BrainAtlasMask; spc:BrainAtlasImage; rdf:type spc:Headscan; ?brainAtlasMask spc:MHA) rdf:type = ?format ?brainAtlasImg || spc:NRRD rdf:type = (?format FILTER ?headscan Rl o h eatcBanMs eeainexpert. Generation Mask Brain semantic the for Rule 9.4: Listing department cfra spc:NRRD. spc:NRRD. dc:format dc:format ?format. dc:format ita aaintegration data virtual cfra spc:NRRD. spc:NRRD. dc:format dc:format ?format. dc:format eore nacii.Telte on oindividual to point latter The clinic. a in resources osssof consists runtime . ohrta urigasnl,goa triple global single, a querying than (other resources. . eia sitneScenarios Assistance Medical 9.2 starting ikdData Linked rdf:li patient semantic resource 143 re- to

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

1 @prefix rdf:. 2 @prefix vocxnat:. 3 @prefix xnat:. 4 @prefix httpm:. 5 @prefix http:. 6 7 { 8 _:a http:mthd httpm:GET; 9 http:requestURI xnat:xnatlist. 10 } 11 12 { 13 ?list vocxnat:hasProject ?project. 14 } => { 15 _:a http:mthd httpm:GET; 16 http:requestURI ?project. 17 }. 18 19 { 20 ?project vocxnat:hasSubject ?subject. 21 } => { 22 _:a http:mthd httpm:GET; 23 http:requestURI ?subject. 24 }. 25 26 { 27 ?subject vocxnat:hasFile ?file. 28 } => { 29 _:a http:mthd httpm:GET; 30 http:requestURI ?file. 31 }. Listing 9.5: Linked program for gathering patient-related resources.

Having gathered and integrated the needed set of triples to query all semantic experts, goal- oriented plans are needed to (i) find semantic experts which solve respective steps of an Ab- stract TPM SEP and (ii) query the latter with correct inputs to solve a Grounded TPM SEP. We now show that we can reduce respective SEPs to MDPs in order to directly use available Planning techniques for the abstract and the grounded case.

An Abstract Semantic Planner for TPM By assuming a controlled vocabulary for all semantic experts, we can reduce both Abstract TPM SEPs as well as Grounded TPM SEPs to MDPs. We first discuss task properties of Abstract- and Grounded TPM SEPs, and then deal with their reduction. TPMs are finite multi-step tasks, where we can confidently set an upper bound to the number of steps needed to generate solutions. They are, however, not always layered, as several steps need to be repeated if multiple headscans of a single patient are available. An Abstract TPM SEP is finite and layered, as we are only interested in finding abstract plans to reach an abstract goal state. A Grounded TPM SEP is finite and non-layered to account for multiplicity. We

144 )adtegone ae( case grounded the and SEPs) Abstract M 3.34), Definition (see MDP the horizon in SEPs Grounded 9.1 with Definition deal and SEPs Abstract for s MDP section. infinite subsequent to SEPs reduce now re ofidapeeeto fsmni xet o h epciesmni ak etu do thus We 9.2 Definition task. semantic respective the property. for multiplicity experts the semantic with deal of not preselection a find to order tions. M MDP edsigihbtenteasrc ae( case abstract the between distinguish We ,w xlctyrdc h opeiyo ie tr-adga ttsin states goal and start- given of complexity the reduce explicitly we SEPs, Abstract For LtTr Let • LtR Let • Let • A Let • S Let • S • Ato pc A space Action • Tedfiiino Tr of definition The • a add we task, TPM semantic the of goal the define To • l vial eatceprs hr tt s state a where experts, s semantic available all epr r ece,gnrtsalo,i.e. loop, a generates reached, are expert TPM semantic s the of preconditions the when e functions expert s i.e. a, action generates which expert semantic r-adpscniin)tu lasrslsi s visited in prior results potentially with always preconditions thus of postconditions) union and (i.e. pre- s state in expert) semantic (i.e. a). 0 SEP ABSTR = SEP ABSTR = γ Tr Tr SEP SEP SEP MDPs. for defined as hyperparameter, factor discount the SEP SEP ABSTR SEP ABSTR sdfie seueaino l nqepeodtosadpscniin of postconditions and preconditions unique all of enumeration as defined is ,w en h olwn rprisfrissae n func- and spaces its for properties following the define we 9.1), Definition (see Rdcino btatSP oIfiiehrznMDPs) Infinite-horizon to SEPs Abstract of (Reduction Rdcino Est nnt-oio MDPs) Infinite-horizon to SEPs of (Reduction h tt space. state the : h cinspace. action the : S S SEP ( SEP ( s s , , a SEP ABSTR × a ) × ) . A BGP A hoeial otisteuino r-adpeodtoso the of preconditions and pre- of union the contains theoretically – SEP SEP SEP ,i tt-eedn n ossso ure fabstract of queries of consists and state-dependent is EPs, in as , o h epciesae ..A i.e. state, respective the for → ssrihfrad si sdtriitc hoiga cina action an Choosing deterministic. is it as straightforward, is → R h rniinfunction. transition the S h eadfunction. reward the SEP M SEP GRND ( = M S SEP SEP ABSTR SEPs). Grounded for (as spaces RDF with ) , A SEP ihecuieyasrc pcs(sfor (as spaces abstract exclusively with ) 0 = 0 , ie no fswt rcniin of preconditions with s of union (i.e. γ 0 Preconditions SEP ABSTR , eutn rmatasto i.e. – transition a from resulting Tr . eia sitneScenarios Assistance Medical 9.2 . dummy SEP t ninfinite- an to SEP a reduce We = , R { SEP a | e cint A to action . BGP ) ie infinite-horizon Given . ( e s ∪ = ) e BGP a SEP ABSTR } . ( s ) . which, 145

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

ABSTR ABSTR ABSTR ABSTR • The reward function RSEP is a SSEP × ASEP matrix with |ASEP | = |E| + 1 (see Equation 9.1).

( 1 if a equals dummy expert and s equals goal RABSTR(s,a) = (9.1) SEP 0 otherwise By using any strategy to plan in the MDP (e.g value iteration), we efficiently find eligible semantic experts to solve the task. The resulting abstract semantic planner takes as input semantic experts, respective patient- or background information and a goal state, and returns a set of eligible semantic experts. The corresponding semantic agent rule for the abstract semantic planner is depicted in Listing 9.6. Here, the preconditions for the semantic Brain Mask Generation expert (see Listing 9.1) would replace the state. 1 @prefix rdf:. 2 @prefix sep:. 3 @prefix exp:. 4 @prefix httpm:. 5 @prefix http:. 6 7 { 8 ?col rdf:type rdf:Bag; 9 rdf:li ?exp. 10 ?exp rdf:type sep:SemanticExpert. 11 ?goal rdf:type sep:GoalState, 12 sep:GroundedState. 13 ?start rdf:type sep:StartState, 14 sep:AbstractState. 15 } => { 16 _:a http:mthd httpm:POST; 17 http:requestURI exp:abstractplanner; 18 http:body 19 { 20 ?col rdf:type rdf:Bag; 21 rdf:li ?exp. 22 ?exp rdf:type sep:SemanticExpert. 23 ?goal rdf:type sep AbstractState, sep:GoalState. 24 ?start rdf:type sep:GroundedState, sep:StartState. 25 }. 26 }. Listing 9.6: LD-Fu rule for executing the abstract semantic planner.

A Grounded Semantic Planner for TPM An abstract semantic planner discovers suitable semantic experts for a task, but cannot deal with challenges resulting from grounding preconditions for available states in RDF. We al- ready discussed the problem of multiplicity, where semantic experts might require an un- known number of inputs for their input parameters. To illustrate further challenges, consider

146 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 cnit of consists TPM generated a namely As collections, for resources. 9.7. class Listing different in a unordered task ploit an TPM masks, brain semantic processed a timely-ordered of goal abstract the ntga1rftp e:btattt,sep:GoalState; sep:AbstractState, rdf:type xnat:goal1 sep: @prefix http://www.w3org/TRrdf-sparqlquery/#> < sparql: @prefix http://www.w3org/2000/01/rdf-schema#> < dc: @prefix rdfs: @prefix http://aifb-ls3vm2.kitedu:8080/xnatwrapper/id> < rdf: @prefix xnat: @prefix http://www.w3org/nssawsdl#> < sawsdl: @prefix spc: @prefix spp: @prefix 9 https://www.w3.org/TR/rdf-schema/#ch_seq sep:AbstractCondition; a [ d:au """{ rdf:value nrMs2rftp ?norm; { OPTIONAL || spc:NormalizedBrainMask = ?norm; (?norm FILTER rdf:type ?normMask2 rdf:type spc:RegisteredBrainMask; ?normMask1 rdf:type spc:RegisteredBrainMask; ?reg_mask2 rdf:Seq; rdf:Bag. spc:TumorProgressionMap; rdf:type rdf:type ?reg_mask1 rdf:type rdf:type ?tpm_seq ?reg_bag ?tpm1 Eepayga fasmni P task. TPM semantic a of goal Exemplary 9.7: Listing nrMs2spp:manualSegmentation ?normMask2 spp:manualSegmentation ?normMask1 }}"""^^sparql:GraphPattern]. sawsdl:modelReference nr spc:RobustNormalizedBrainMask). = ?norm cfra spc:NRRD; xnat:headscan2. spp:headscan spc:NRRD; dc:format xnat:headscan1. spp:headscan dc:format ?reg_bag; spc:NRRD; spp:normalizedMask xnat:headscan2; rdfs:member spp:headscan dc:format ?reg_bag; spc:NRRD; spp:normalizedMask xnat:headscan1; ?reg_mask2. rdfs:member ?reg_mask1; spp:headscan dc:format rdf:_2 rdf:_1 ?tpm_seq. spp:mapContent rdf:Seq xnat:seg2. xnat:seg1. ?brainMask2. ?brainMask1. rdf:Bag ( cesdo 05/01/2018 on accessed 9 . eia sitneScenarios Assistance Medical 9.2 hc are which , sntsial.W hsex- thus We suitable. not is ordered ) eune of sequences 147

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

The goal BGP defines that three headscans have to be used to generate the TPM, consisting of two registered brain masks. The semantic agent has to correctly query a semantic Brain Mask Generation expert and a semantic Brain Normalization expert for each headscan (and resulting brain mask) individually to then register all normalized brain masks and create the final TPM once. Given a large amount of headscans for a patient, where only two headscans should be chosen for the resulting TPM, we need to first ground all possible states with respect to available preconditions of semantic experts and then plan with respect to the goal. In addition to grounding all possible expert configurations for a start state, we need to specifically deal with multiplicity, i.e. the availability of containers where multiple inputs of the same type are possible state spaces. An example for such containers was presented for the preconditions of semantic Brain Registration experts (see Listing 9.3), where a bag of brain masks with different headscans of a single patient is required for execution. We thus have to instantiate all possible bag configurations (i.e. bags of different sizes with different brain masks) to be able to correctly plan. GRND We can now define the elements of MSEP , the infinite-horizon MDP for Grounded SEPs (Definition 9.3). Definition 9.3 (Reduction of Grounded SEPs to infinite-horizon MDPs). Given infinite- GRND horizon MDPM SEP (see Definition 9.1), we define the following properties for its spaces and functions. GRND MSEP is a reduction of a Grounded SEP, where (i) grounded expert functions eRDF gener- ate grounded action hypotheses aRDF ∈ ARDF as well as grounded states sRDF ∈ SRDF or (ii) abstract expert functions eBGP are queried to generate abstract action hypotheses aBGP ∈ ABGP as well as abstract states aBGP ∈ ABGP. GRND 1 H+1 MSEP is instantiated by calling initializeGroundedPlanning(sRDF,πABSTR,E,sBGP ) 1 (Algorithm4) with the available start state s RDF, learned abstract policy πABSTR, expert set E H+1 and available goal sBGP . We first describe an algorithm for retrieving all valid groundings based on a semantic ex- pert, which is needed to tackle the problem of multiplicity (see Algorithm3), before presenting GRND the general procedure to retrieve the model for MSEP (see Algorithm4), as used in Defini- tion 9.3. The details of the algorithm are as follows: • line 1: The collector set is initialized. It will eventually consist of all possible variable bindings for an expert’s preconditions and a state.

• line 2-5: For all bags available in the precondition, we must find all possible configu- rations. We therefore assume that preconditions exactly define resources of the bag in order to query the state for all possible resources.

• line 6-10: For all available candidate resources with respect to the bag, we construct possible configurations and add them to the collector set.

148 rcdr oinstantiate to procedure 3 Algorithm 11: 10: 12: 9: 8: 7: 6: 5: 4: 3: 2: 1: h eal fteagrtmaea follows: as are general algorithm The the expert. of and details state The a for configuration possible all find to strategy a have now We • • • return S for end o all for collector , l lgbeeprs(nodro h tp) o ahepr,w ntaietbe ftem- (grounded of expert tables the initialize executing we from expert, result each which For transitions and steps). actions the states, of porary order (in experts eligible all il ie o ifrn nu ofiuain ihnasnl step. single a within configurations input different for times tiple ts,where task, TPM respective the solving for experts eligible abstractPlan comprises thus and plan) 5-8: line tables. as represented are 1-4: line function transition The expert. ( the for configurations input s eutn state resulting A postconditions its via retrieved configuration, input (i.e. respective the space for state executed returned is expert The expert. the for tions all method derive specifically, now collector More can (i.e. we expert state. bags, rent the and all preconditions on getGroundings the based for configurations groundings additional possible the and bags) 11-12: line A s CONF o all for candidates q tempCollector collector for end , , Tr a ← , bag s Postconditions S tempCollector ← 0 sapsil nu ofiuain h cinspace action The configuration. input possible a is getQuery ) , getValidGroundings A ae ntesmni xet hr h eatcepr a eeeue mul- executed be can expert semantic the where expert, semantic the on based n getGroundings , ∈ Spaces Tr ← { ∈ abstractPlan Preconditions ae nalrsda aibe ntepeodtos(hc r o within not are (which preconditions the in variables residual all on Based .Terslso h ur hncmrs l dqaeiptconfigura- input adequate all comprise then query the of results The ). /0 ← 1 ← h ,..., collector ( s S eun h xetfrstep for expert the returns M bag 0 Preconditions queryState and ← rmeeuigteepr sdfie as defined is expert the executing from SEP GRND ure h ie tt ae ntepeodtoso h cur- the of preconditions the on based state given the queries | candidates ) /0 ← A e h btatfnto fteexpert the of function abstract the – ) swl sfunctions as well as 4. Algorithm in summarized is ( tempCollector collector orsod oalandasrc oiy(..abstract (i.e. policy abstract learned a to corresponds e ∪ do ( tempCollector ( e q , , s | s ) e do ) oetal ruddb a ofiuain (in configurations bag by grounded potentially ) , e , s ) ∪ h ae nteasrc ln elo over loop we plan, abstract the on Based . subsets S Tr ossso h eutn ttsatrthe after states resulting the of consists and . eia sitneScenarios Assistance Medical 9.2 R ( n r ntaie,weetelatter the where initialized, are , candidates s A 0 Tr = opie h respective the comprises s ipycpue tuples captures simply e ∪ BGP e BGP stu executed. thus is ) ( s CONF ) where , 149

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Algorithm 4 initializeGroundedPlanning(s,abstractPlan,E,goal) 1: S ← {s} 2: A ← /0 3: Tr ← /0 4: R ← /0 5: for all h ∈ 1,...,abstractPlan do 6: tempStates ← /0 7: tempActions ← /0 8: tempTransitions ← /0 9: for all s ∈ S do 10: STEMP,ATEMP,TrTEMP ← getValidGroundings(abstractPlanh,s) 11: tempStates ← tempStates ∪ STEMP 12: tempActions ← tempActions ∪ ATEMP 13: tempTransitions ← tempTransitions ∪ TrTEMP 14: end for 15: S ← S ∪ tempStates 16: A ← A ∪ tempActions 17: Tr ← Tr ∪ tempTransitions 18: S,A,Tr,R ← setGoalBasedRewards(S,A,T,goal) 19: end for 20: return S,A,Tr,R

150 n pcfiain:QM ita P eso ..,26,9 H,4G A,Ubuntu RAM, GB 4 MHz, follow- 2266,796 the 0.9.1, with Version LTS. machine Jena CPU 12.04.4 virtual Virtual Apache a QEMU in as deployed specifications: well been ing as have JAX-RS experts the expose processing with of image only implementation JavaEE semantic and the used For experts we respective TPMs. experts, generating the domain eventually semantic wrap with for closely correctly needed worked to are we which MITK parameters process, descrip- of lifting textual developers) the short For where (i.e. calls, experts exist. line parameters command available standard for via tions experts to access offers MITK time execution the of terms in concept LAPIs. expert resulting semantic the the of of suitability the evaluate now We Evaluation the Performance make Time to have 2 LAPIs factor As eliminating thus resources, bags. involved the respective determine) outputs of to resources inputs from the relationship about knowledge any have addts(..hasa eore nasae.Ta s ntewrtcs igebgresults bag single a case worst As the in lengths task. is, of That semantic combinations all state). the in a produces of in resources – steps headscan unconstrained (e.g. of candidates when number – O the which within on Generator is bound algorithm, Mask several complexity Brain a Although the the set steps. e.g. result, confidently of – a number once can the than one more for – executed bound be expert a to H have with might experts maximum, semantic at times H repeated is 9.1. Analysis Complexity Setup: not do we where case general most the depict complexity its and algorithm the that Note 10 • • https://jena.apache.org/ ,i endbsdo h olsae(i.e. state goal the on based defined is 9.1), anal- Equation function, reward (see The case goal planning. abstract for the used be to can ogous and returned finally are functions step only finished 15-20: should The we they line after state. as actions a sets, and and state action eventual expert and the an state to for added temporary actions be respective and to states added grounded are correctly outcomes all get to called Method is inputs. as states generated prior 9-14: line abstract). or ). 9.1 Table in summarized (as library MITK the through accessed were experts All hr um cini de rmga tt oga tt ihrwr fone. of reward with state goal to state goal from added is action dummy a where ) ie nepr ihnteasrc ln ehv otk noacutall account into take to have we plan, abstract the within expert an Given h nihdsaeadato pcsa ela h rniinadreward and transition the as well as spaces action and state enriched The h loih a ob xctdfraleprsprsae which state, per experts all for executed be to has algorithm The 1 ,..., ( cesdo 05/01/2018 on accessed explicit | ( D | E | h vrl opeiyi hswti O within thus is complexity overall The . || S | H ecnesl eue(n nms ae exactly cases most in (and reduce easily can we , ) h ubro steps of number The . getValidGroundings 2 ) | D | . eia sitneScenarios Assistance Medical 9.2 ttswt h e fdecision of set the D with states | D | . | h S sitoue before, introduced as , . | sdpneto the on dependent is 10 h resulting The . ( | E || S | 2 | D | 151 H ) .

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Experiment Execution time (in seconds) Valid result Local 592.48 Yes Semantic 803.26 Yes

Table 9.2: Evaluation results for local versus semantic expert executions.

To compare the runtimes, we first manually executed the experts as available prior to the lifting process, i.e. on a local machine via command line tools of MITK. We then ran each semantic expert with respective RDF graph inputs as gathered by the semantic agent. The inputs where gathered via the XNAT RDF wrapper within the presented Web architecture (see Figure 9.2) and fed back into the platform after each step. We used two test patient samples with two headscans (in NRRD format), where available background information consisted of an atlas image (in NRRD format) as well as an atlas mask (in NRRD format). We averaged the time needed for processing both headscans. Results: Table 9.2 shows the resulting time needed to run each (semantic-) expert in sec- onds (s). The TPM generation process was successful for using the descriptions of the individual semantic image processing experts and an initial implementation of LD-Fu without an abstract semantic planner. The semantic (i.e. automatic) execution of the semantic experts by the proposed semantic agent produced the same (and thus correct) results as the manual local execution. While the resulting overhead for the semantic TPM task is about 210 seconds, this does not impede the diagnostic process of radiologists as the TPM can be generated in advance. The overhead is mainly produced by downloading and uploading the required and produced files from and to XNAT. We continue to evaluate the outputs of the proposed abstract semantic planner.

Semantic Planning Evaluation We first evaluate our abstract semantic planner in terms of its ability to exclusively find se- mantic experts needed for the abstract semantic TPM task. We then generalize the results to grounded semantic planning for grounded semantic TPM tasks. Semantic Abstract Planning Setup: The abstract semantic planner instantiates an infinite- ABSTR ABSTR ABSTR horizon MDP MSEP and automatically constructs TrSEP and RSEP based on available semantic experts, the TPM goal and given patient- as well as background information. Con- sider a grounding for the semantic Brain Mask Generation expert and an abstract goal describ- ing a TPM (see Listing 9.7) with all paths to the TPM being possible. Besides the 6 semantic experts involved in the TPM process, the action space consists of 2 semantic phase recogniz- ers of the phase recognition scenario we present in Section 9.2.3. For the proof-of-concept, we use discount factor of λ = 0.9 and perform value iteration (see Algorithm1) for a given patient with two headscans (in NRRD format), additional segmentation for Robust Brain Nor-

152 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 9.9. Listing in state, depicted respective goal is The the planner zero. abstract reach of semantic values to state the enable assigned of potentially are output experts latter recognition the phase if semantic the zero where than the greater for values (except state goal discloses the reach to executed be 9.3. Table in summarized are which iterations, 6 after values state converged start the summarizes 9.8 Listing (in format). NRRD image (in atlas mask an atlas of an state. consisting as well information as background format) and NRRD format) NRRD (in malization ttswt ausgetrta eodpc rcniin fsmni xet hc aeto have which experts semantic of preconditions depict zero than greater values with States eatcAsrc lnigResults: Planning Abstract Semantic sep:StartState; sep:GroundedState, rdf:type xnat:start1 sep: @prefix http://www.w3org/TRrdf-sparqlquery/#> < sparql: @prefix http://www.w3org/1999/02/22-rdf-syntaxns#> < dc: @prefix rdf: @prefix xnat: @prefix http://www.w3org/nssawsdl#> < sawsdl: @prefix spc: @prefix spp: @prefix sep:GoundedCondition; a [ d:au """{ rdf:value ntsg d:yespc:ManualSegmentation; spc:ManualSegmentation; rdf:type rdf:type xnat:seg2 spc:BrainAtlasMask; spc:Patient. xnat:seg1 spc:BrainAtlasImage; rdf:type xnat:brainAtlasMsk rdf:type rdf:type xnat:brainAtlasImg spc:Headscan; xnat:patient1 rdf:type spc:Headscan; xnat:headscan2 rdf:type xnat:headscan1 sawsdl:modelReference Satsaeo eatcTMtask. TPM semantic a of state Start 9.8: Listing }"""^^sparql:GraphPattern]. p:ain xnat:patient2. spc:NRRD; spp:patient dc:format xnat:patient1. spc:NRRD; spp:patient spc:NRRD. dc:format spc:NRRD. dc:format dc:format xnat:patient1. spp:patient spc:NRRD; spp:manualSegmentation dc:format xnat:patient1. spp:patient spc:NRRD; spp:manualSegmentation dc:format hnrnigvleieainon iteration value running When dummy xnat:seg2; xnat:seg1; cin.A eut au trto only iteration value result, a As action). . eia sitneScenarios Assistance Medical 9.2 M SEP TPM ederive we , 153

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

State number State description State value 1 Brain Mask Generation 0.32805 2 Standard Normalization 0.3645 3 Robust Normalization 0.3645 4 Batched Folder Registration 0.405 5 Tumor Segmentation 0.45 6 Map Generation 0.45 7 Dummy 1.0 8 Rule-based Phase Recognition 0.0 9 ML-based Phase Recognition 0.0

Table 9.3: Evaluation results for abstract semantic planning.

1 @prefix spc:. 2 @prefix xnat:. 3 @prefix rdf:. 4 @prefix rdfs:. 5 @prefix exp:. 6 @prefix sep:. 7 8 _:pbag rdf:type rdf:Bag, sep:PolicyBag. 9 10 _:pol1 rdf:type rdf:Seq; 11 rdfs:member _:pbag; 12 rdf:_1 exp:BMG; 13 rdf:_2 exp:BN; 14 rdf:_3 exp:BR; 15 rdf:_4 exp:TS; 16 rdf:_5 exp:MG. 17 18 _:pol2 rdf:type rdf:Seq; 19 rdfs:member _:pbag; 20 rdf:_1 exp:BMG; 21 rdf:_2 exp:RBN; 22 rdf:_3 exp:BR; 23 rdf:_4 exp:TS; 24 rdf:_5 exp:MG. 25 26 exp:BMG rdf:type sep:SemanticExpert, spc:BrainMaskGeneration. 27 exp:BN rdf:type sep:SemanticExpert, spc:BrainNormalization. 28 exp:RBN rdf:type sep:SemanticExpert, spc:RobustBrainNormalization. 29 exp:BR rdf:type sep:SemanticExpert, spc:BrainRegistration. 30 exp:TS rdf:type sep:SemanticExpert, spc:TumorSegmentation. 31 exp:TPM rdf:type sep:SemanticExpert, spc:MapGeneration. 32 33 xnat:goal1 rdf:type sep:State, sep:GoalState, sep:AbstractState. 34 xnat:start1 rdf:type sep:State, sep:StartState, sep:GroundedState. Listing 9.9: Abstract semantic TPM plan.

154 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 cmrssteotu ftegrounded the of output the comprises 9.10 step Listing for planner individually. semantic headscans both for expert We times. multiple executed be might on expert each iteration step value each run for where steps, all for executions instantiate we state, factor goal discount and experts with available plan, abstract state, start TPM xnat:headscan1 ihpstv ausspott ec h ol h cin orahteesae r state step are in states case evaluation these TPM reach our to For actions executed. The be to goal. need h the which experts reach for to configurations support values positive with expert semantic the executing the using namely as by possible, well are once executions as expert 9.8 ) executions Generation expert Mask Listing semantic Brain (see generating state by starts start step. planner respective for each semantic and of grounded experts 9.7 the semantic plan, Listing abstract of in preconditions goal the the ground Assuming gradually to plan semantic stract = xnat:grounding2 spc:BrainMaskGeneration; sep:SemanticExpert, rdf:Bag. rdf:type rdf:type xnat:grounding1 exp:BMG _:ebag sep: @prefix exp: @prefix dc: @prefix . sparql: @prefix rdfs: @prefix rdf: @prefix xnat: @prefix . sawsdl: @prefix spc: @prefix spp: @prefix Results: Planning Semantic Grounded Setup: Planning Semantic Grounded h ,w edt aetoatos aeyeeuigtesmni ri akGeneration Mask Brain semantic the executing namely actions, two take to need we 1, = ,ie ri akGnrto.Gvntesat n olsae ubro semantic of number a state, goal and start- the Given Generation. Mask Brain i.e. 1, sep:GoundedCondition; a [ d:au """{ rdf:value xnat:headscan1 ntbantaMkrftp spc:BrainAtlasMask; spc:Patient. spc:BrainAtlasImage; rdf:type xnat:brainAtlasMsk rdf:type spc:Headscan; rdf:type xnat:brainAtlasImg xnat:patient1 rdf:type xnat:headscan1 e:xetexp:BMG; spc:ExpertExecution; sawsdl:modelReference sep:expert rdf:type _:ebag. rdfs:member λ and M = SEP GRND h xnat:headscan2 0 }"""^^sparql:GraphPattern]. . = cnit falpsil asrc)smni expert semantic (abstract) possible all of consists MDP The 9. ofida prpit plan. appropriate an find to 1. neb using by once , cfra spc:NRRD. spc:NRRD. dc:format dc:format xnat:patient1. spc:NRRD. spp:patient dc:format o ruddsmni lnig eruea ab- an reuse we planning, semantic grounded For yrnigvleieainon iteration value running By o the on 4 Algorithm running By individually. xnat:headscan2 . eia sitneScenarios Assistance Medical 9.2 rtieb using by twice or , M M SEP GRND SEP GRND l states all , TPM for 155

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

31 rdf:type spc:ExpertExecution; 32 sep:expert exp:BMG; 33 sawsdl:modelReference 34 [ a sep:GoundedCondition; 35 rdf:value """{ 36 xnat:headscan2 rdf:type spc:Headscan; 37 dc:format spc:NRRD. 38 spp:patient xnat:patient1. 39 xnat:patient1 rdf:type spc:Patient. 40 xnat:brainAtlasImg rdf:type spc:BrainAtlasImage; 41 dc:format spc:NRRD. 42 xnat:brainAtlasMsk rdf:type spc:BrainAtlasMask; 43 dc:format spc:NRRD. 44 }"""^^sparql:GraphPattern]. 45 46 xnat:goal1 rdf:type sep:State, sep:GoalState, sep:AbstractState. 47 xnat:start1 rdf:type sep:State, sep:StartState, sep:GroundedState. Listing 9.10: Output of the grounded semantic planner for the step h = 1. In general, the process needs to be repeated for each step h ∈ 1...,H as we need to learn expert weights based on the respective state groundings. For TPM without learning, we could find an appropriate grounded plan by planning once on the grounded start state and use the BGP state configurations for later steps. We will deal with a different case in Section 9.3 for NERD.

9.2.3 Semantic Surgical Phase Recognition In surgical phase recognition (see Section 2.2), one aims to predict the current surgical phase by various sensory information – in our scenario the latter are restricted to activity triples consisting of the used surgical instrument, the performed action and the treated human or- gan. We base our application and evaluation of phase recognition on two experts, i.e. (i) An expert based on manually defined rules in the Semantic Web Rule Language (SWRL) for a domain ontology [63] and (ii) a supervised learning-based expert (referred to asML-based phase recognition expert), fitting a random forest model [23] for multiclass classification with annotated surgeries as training data (i.e. surgical activities labeled with the correct surgical phase). We first present the lifting process for semantic phase recognition experts and then proceed with our grounded semantic learner.

Semantic Surgical Phase Recognition We lifted both phase recognition experts to semantic experts, and defined their inputs and outputs in terms of semantic pre- and postconditions. See Listing 9.11 for the preconditions of the semanticML-based phase recognition expert and note that all semantic descriptions for Surgical Phase Recognition experts are presented in AppendixA. The semantic phase recognition experts need to be initialized with a separate laparoscopic ontology with concepts for the surgical setting. This is due to the implementation of the expert

156 13 12 11 10 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 vial eatceprscoe ya btatsmni lne n ute rpe bu the about triples further and planner assumes semantic only abstract It an learner. by semantic chosen grounded experts the List- semantic it executing available preconditions. for As rule general resulting less result. the weight define depicts best we to 9.13 experts, the able ing recognition choose is phase eventually and semantic to II for works experts Part only in recognition proposed phase methods semantic on competing based the is learner semantic grounded Our Recognition. Phase Surgical for Learner Semantic Grounded A has addition, in recognizer, phase -based ML The samples. expert. with semantic trained the be to to lifting the as well as hc nue htol oee hssocradrcgie hssbln oteeetof event the to belong phases recognized and occur preconditions. phases the modeled only that ensures which eetrftp spc:SurgicalEvent; spc:Phase; rdf:type rdf:type ?event ?phase dc: @prefix rdf: @prefix spc: @prefix spp: @prefix spc:Action. as typed spc:TreatedStructure. be to have results that state 9.12) spc:Instrument. Listing (see postconditions rdf:type The rdf:type spc:SurgicalEvent; rdf:type ?structure spc:Ontology. ?action rdf:type ?instrument rdf:Bag; rdf:type spc:Surgery. ?event rdf:type ?ontology rdf:type ?trainingSample ?col http://www.w3org/1999/02/22-rdf-syntaxns#> < http://surgipedia.sfb125de/wikiSpecial <:URIResolver/Category-3A>. rdf: @prefix http://surgipedia.sfb125de/wikiSpecial <:URIResolver/Property-3A>. spc: @prefix spp: @prefix bsdexpert. ML-based semantic the of Postconditions 9.12: Listing bsdexpert. ML-based semantic the of Preconditions 9.11: Listing p:hs ?phase. ?action; ?structure. spp:phase spp:structure spp:action ?instrument; spp:instrument ?action; ?structure. spp:structure spp:action ?instrument; spp:instrument p:au ?value. spp:value ?trainingSample. rdf:li . eia sitneScenarios Assistance Medical 9.2 spc:Phase 157 ,

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

activities. Note that we can elegantly define the generality of the grounded semantic learner based on the concept types we use. If it was able to optimally choose among two or more image processors as well, we could easily express that via preconditions and postconditions. 1 @prefix sep:. 2 @prefix spc:. 3 @prefix rdf:. 4 @prefix httpm:. 5 @prefix exp:. 6 7 { 8 ?col rdf:type rdf:Bag; 9 rdf:li ?exp. 10 ?exp rdf:type spc:PhaseRecognitionExpert. 11 ?grounding rdf:type sep:State. 12 } => { 13 _:a http:mthd httpm:POST; 14 http:requestURI exp:groundedlearner; 15 http:body 16 { 17 ?col rdf:type rdf:Bag; 18 rdf:li ?exp. 19 ?exp rdf:type spc:PhaseRecognitionExpert. 20 ?grounding rdf:type sep:State. 21 }. 22 }. Listing 9.13: LD-Fu rule for executing the grounded semantic learner. The grounded semantic learning component assesses the performance of a given semantic phase recognizer based on training samples close to the state (i.e. decision candidate for the phase recognition scenario). Our initial approach is a batch learning variant of Single-Avg as introduced for evaluation in Chapter7 using single-expert meta dependencies (as introduced in Sec 5.3) without further weightings of the latter. To get batch training samples, one trains theML-based phase recognizer on parts of the training sets (i.e. all surgeries but one) and predicts on the remaining surgery. The process is repeated until we have predictions for all surgeries. For the rule-based phase recognizer, one simply has to execute the latter on all left out surgeries, as no training takes place. Finally, the optimal hypothesis is generated based on the weighted combination of available semantic experts.

Grounded Semantic Learner Evaluation We evaluate the performance of the grounded semantic learner for surgical phase recognition by running the automation architecture and quantifying the outcome optimization performance of the resulting grounded phase recognition SEP. Setup: Kernels: We define and exclusively use a straightforward kernel for surgical activities based on the exact pairwise match of structure, instrument and/or action. If for two activities the current triples plus their three predecessors match, the activities get assigned a similarity of

158 9.2. Equation in of defined structure is kernel and The action one). instrument, by one (i.e. decreasing identical are states time the at of triples activity states past two of and value kernel The 0 otherwise. to 0 decreases value linearly similarity The 1. ecnatmt h epciesmni ak.Utlnw eol explored considering only of we problem now, the Until where task, tasks. semantic single-step semantic a respective for the learning automate semantic grounded can we TPM, for semantic ning respective the and SEP learner executed. recognition correctly semantic phase were grounded grounded experts recognition the the phase importantly, as More automated, successfully compete. to was able was or recognizer weight expert batch Batch-Local. exemplary which denoted Results: Our is precision, evaluation set. was test the used the for we for approach metric predictions learning evaluation correct of The average as one. defined residual is the on predicted and eries measures Evaluation 9.4 Table in summarized Experts ae ngone eatclann o ugclpaercgiinadasrc plan- abstract and recognition phase surgical for learning semantic grounded on Based phase best the outperformed either learner semantic grounded the of precision resulting The Batch-Local Expert ML-based SWRL Expert t h aeiepromne ftetoaalbeeprsfrpaercgiinare recognition phase for experts available two the of performances baseline The : witnas (written 9.5. Table in summarized are results The Eauto eut o h ruddsmni learner. semantic grounded the for results Evaluation 9.5: Table Bsln efracso hs eonto experts. recognition phase of performances Baseline 9.4: Table κ ugr 1 Surgery 0 0 ( 0 1 Surgery s s epromdfiefl rs aiain hr etando orsurg- four on trained we where validation, cross five-fold performed We : . . t 1 9062 9315 . 1 9332 r dnia oisrmn,ato n tutr of structure and action instrument, to identical are ) , s 2 = )                0 0 1 0 0 . . . 25 5 75 0 0 2 Surgery 0 2 Surgery . . 6635 7753 . 7786 s s x x x t t 1 1 = = = . 5i nytecretatvt rpe ac n has and match triples activity current the only if 25 6= = 0 0 0 s s , , , t t 2 2 : 1 1 1 ∧ , , : 2 2 s s 0 0 3 Surgery , s t 1 0 3 Surgery t 1 : 3 . . 1 − − 9032 89 . s 9180 1 x and t 1 − s 6= = x t 1 − = s s s x t t 2 2 2 − − = s stu ihi h epciecurrent respective the if high thus is 1 x t 2 . eia sitneScenarios Assistance Medical 9.2 − ∧ s x t 2 − s ∧ t 1 x 0 0 4 Surgery − 0 4 Surgery s . . 2 t 1 4484 8137 . − 7782 6= 3 6= s t 2 − s 2 t 2 − 3 s 2 0 0 5 Surgery ttime at 0 5 Surgery . . 6383 7241 . 7238 t with , (9.2) 159 s 1 t

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes future impacts when taking decisions is not prevalent. As a consequence, a central challenge for grounded semantic planning did not occur, namely dealing with a large amount of con- flicting groundings, due to exchangeable semantic experts. To this end, we present semantic meta components for a NLP scenario, where prior men- tioned complexities are available. Here, the grounded semantic learning technique can be completely reused, but the semantic description for the respective grounded semantic learn- ers has to be adapted. However, a novel semantic meta component for grounded semantic planning has to be developed, which deals with conflicting state configurations.

9.3 The Named Entity Recognition & -Disambiguation Scenario

In NERD, one deals with ambiguity in unstructured text, mostly available on the Web. As tex- tual mentions of entities often are ambiguous, it often remains unclear which real-world entity is referred to. It is thus important to find links between such mentions and entity resources, as available inKBs. For the semantic NERD task, we reuse all experts of Chapter7. We first present the se- mantic descriptions for NER and NED experts. Based on the latter, we then discuss how to integrate semantic meta components for grounded semantic learning in order to weight se- mantic NER- and NED experts. We finally discuss grounded semantic planning, where we extend our TPM approach to deal with conflicting hypotheses of available NER experts.

9.3.1 Semantic Descriptions for Semantic NER & NED Experts Numerous NER, NED and NERD experts are available as Web services and provide rich descriptions of inputs and outputs. While inputs for NER are confined to text, outputs usually consist of mappings of tokens or token sequences to named entities. These mappings consist of start- and end positions of tokens as well as the predicted named entity type (which we do not need for solving NERD). Depending on the implementation of the expert and the Web service, the output might also provide a confidence measure for each mapping. NED Web services usually require an annotated version of the text, where a specific syntax needs to be used. The produced output consists of mappings from a token or token sequences toKB resources, the position of the token or token sequence and, if available, a confidence measure for each prediction. NER and NED Web services might provide further parameters about their functionality. For modeling the preconditions and postconditions, we reuse the NLP Interchange Format (NIF) 11, a RDF vocabulary for describing both Web services and datasets for several NLP tasks. The preconditions for semantic NER experts are depicted in Listing 9.14. They confine text inputs to be sentences or complete paragraphs with adequate NIF annotations. Parameters specific to the respective expert can be set as well, but remain optional. For predicates and

11http://persistence.uni-leipzig.org/nlp2rdf/ (accessed on 05/01/2018)

160 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 13 12 11 10 9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 euenamespace use we NIF, ontology. by provided not concepts utntoelpwt epc otertkn.W oe hsfc yrqiigtilswith triples requiring by fact this model We tokens. their to a respect predicate a (i.e. with in candidate overlap collected entity are not named Annotations must a sequence). of token indexes a end or and token start- about information with annotations tkn d:yesepnlp:Token; rdf:type sepnlp:Annotation; sepnlp:Token; ?token2 rdf:type ?annotation2 rdf:type sepnlp:Annotation; rdf:Bag; ?token1 ?annotation2. rdf:type rdf:type sepnlp:disjunct ?annotation1 ?annotation1 ?col xsd: @prefix rdf: @prefix sepnlp: @prefix nif: @prefix NER are Inputs 9.15. Listing in depicted are sep:ParameterConfig; experts NED semantic of ?type. preconditions The } rdf:type rdf:type nif:Paragraph). ?parameterConfig = { ?textType OPTIONAL || nif:Sentence = (?type FILTER ?text rdf: @prefix http://aifb-ls3vm2.kitedu:8080/experts/sepnlp#> < sepnlp: @prefix nif: @prefix 12 oeta n ol lomdlSAQ lest ietycekfrdisjunction. for check directly to filters SPARQL model also could one that Note sepnlp:disjunct Peodtoso eatcNRexperts. NER semantic of Preconditions 9.14: Listing enptkn?token2; "true"^^xsd:boolean. ?text2; sepnlp:isEntity sepnlp:token ?end1; nif:isString ?start1; ?mention1; ?sentence. nif:referenceContext nif:endIndex nif:beginindex ?token1; nif:anchorOf "true"^^xsd:boolean. ?text1; sepnlp:isEntity sepnlp:token nif:isString ?parameter; ?parameterValue. sepnlp:parameterValue sepnlp:parameter d:i?noain,?annotation2. ?annotation1, rdf:li . h ae niyRcgiin&-iabgainScenario -Disambiguation & Recognition Entity Named The 9.3 rule LD-Fu respective a by inferred directly be can which sepnlp: rdf:Bag orfrt oe,extending novel, a to refer to hr w annotations two where , 161 12 .

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

28 nif:anchorOf ?mention2; 29 nif:beginindex ?start2; 30 nif:endIndex ?end2; 31 nif:referenceContext ?sentence. 32 OPTIONAL { 33 ?parameterConfig rdf:type sepnlp:ParameterConfig; 34 sepnlp:parameter ?parameter; 35 sepnlp:parameterValue 36 ?parameterValue. 37 } Listing 9.15: Preconditions of semantic NED experts.

Finally, the postconditions of NED experts are stipulated in Listing 9.16. It stipulates that the annotation used as NED expert input is required to be enriched with a disambiguated resource.

1 @prefix nif:. 2 @prefix sepnlp:. 3 @prefix rdf:. 4 @prefix xsd:. 5 6 ?col rdf:type rdf:Bag; 7 rdf:li ?annotation1. 8 9 ?annotation1 rdf:type sepnlp:Annotation; 10 nif:isString ?text; 11 sepnlp:token ?token1. 12 sepnlp:isEntity "true"^^xsd:boolean; 13 sepnlp:resource ?resource. 14 15 ?token1 rdf:type sepnlp:Token 16 nif:anchorOf ?mention; 17 nif:beginindex ?start; 18 nif:endIndex ?end; 19 nif:referenceContext ?text. 20 OPTIONAL { 21 ?parameterConfig rdf:type sepnlp:ParameterConfig; 22 sepnlp:parameter ?parameter; 23 sepnlp:parameterValue 24 ?parameterValue. 25 } Listing 9.16: Postconditions of semantic NED experts.

Every NER and NED expert considered in the scenarios was wrapped as semantic expert. We modeled the descriptions with NLP domain experts and developers of NER and NED experts, and made them available via the respectively developed LAPIs. Note that all semantic descriptions for NER and NED experts are presented in AppendixA. TheKB comprises texts, semantic NER- and NED experts and resulting annotations by executing the latter.

162 sipt ie h noaincandidates annotation the Given input. as tweet: following cho- the been pro- consider have might example, mappings experts an entity give NED token-/named To multiple semantic sen. as of configurations, groundings state precondition multiple scenario, duce TPM the to Similar NERD for Planner Semantic Grounded A 9.3.2 loih 5 Algorithm o o oeigmr ope olcin,weerltosaogeeet r ae into taken al- 5). are to (Algorithm elements 4 subsequently among Algorithm summarized relations is extend where the algorithm thus The in collections, We modeled account. complex be more experts. to modeling needs semantic for element of predicate low bag postconditions single (i.e. and a overlap where pre- constructs not respective bag must simple annotations with dealt two only of tokens the annotations, sepnlp:disjunct of bag a Given challenges. two player basketball 14: 13: 12: 11: 10: 4: 3: 2: 1: 9: 8: 7: 6: 5: I ucl eoe nrcal otyot(..qeyeprswt)alvldannotations valid all with) experts query (i.e. out try to intractable becomes quickly It (ii) experts. NED of preconditions the modeling by (i) problem approached partially We text full the take experts semantic available all thus, and, NER for ambiguity no is There W aet xrs that express to have We (i) o all for collector return S for end , etantto u oteroverlap. their to due annotation text adtu ugt r imposed. are budgets thus and NED for A condition configurations o all for candidates q collector for end ← ← bag getGroundings S configurations unfiltered ihe odni aosbsebl player. basketball famous a is Jordan Michael getQueryExtended , getValidGroundingsExtended A n ∈ ← { ∈ Preconditions /0 ← ← 2 ← ,..., eae hm ihrgrst hi tr n n nee.B o,we now, By indexes. end and start their to regards with them) relates getCondition collector queryState ← wihcauses which NED for annotations valid construe to has now one ← | candidates . h ae niyRcgiin&-iabgainScenario -Disambiguation & Recognition Entity Named The 9.3 subsets ( configurations /0 ← ihe Jordan Michael configurations ( bag e ∪ do configurations ( ( n ) q ( , bag , | candidates s do ) ) Michael ( , e Preconditions , s ) ∪ and ) filter , Michael Jordan ( unfiltered , e , ihe Jordan Michael s utntb sdfrone for used be not must ) , condition ) 163 and

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

The altered details of the algorithm are as follows 13:

• Line 4: The condition for the bag is retrieved, which filters the modeled relations among all pairwise triples of the bag.

• Line 5: Other than in the standard case with a single element in the bag, we now need to actively select the element to retrieve all unfiltered resources.

• Line 8: Similar to the original algorithm, we first retrieve all possible configurations (in set unfiltered) regardless of the condition.

• Line 9: The configurations are filtered according to the condition (in method filter, which has to be tested for all pairs of resources. The rest of the algorithm is equivalent to Algorithm3.

Example 9.1 illustrates the procedure for a piece of text with two available annotations. For grounded states in theKB, we use namespace sepkb:.

Example 9.1 (Grounding for NED). Given state s with text Michael Jordan is a famous basketball player and decision candidates 1s =Michael Jordan and 2s =basketball and NED expert e with preconditions as shown in Listing 9.15, we need call Algorithm4 with Algorithm5 to ground the available bag of annotations with respect to finding configuration decision candidates 1s and 2s. Listing 9.17 shows the resulting condition of a LD-Fu rule, where resource sepkb:token1 denotes 1s and resource sepkb:token2 denotes 2s.

1 @prefix nif:. 2 @prefix sepnlp:. 3 @prefix sepkb:. 4 @prefix rdf:. 5 @prefix xsd:. 6 7 ?col rdf:type rdf:Bag; 8 rdf:li ?annotation1, 9 ?annotation2. 10 11 ?annotation1 rdf:type sepnlp:Annotation; 12 nif:isString ?text; 13 sepnlp:token sepkb:token1; 14 sepnlp:isEntity "true"^^xsd:boolean. 15 16 ?annotation2 rdf:type sepnlp:Annotation; 17 nif:isString ?text; 18 sepnlp:token sepkb:token2; 19 sepnlp:isEntity "true"^^xsd:boolean. 20 21 sepnlp:token1 rdf:type sepnlp:Token.

13For the general description see the details of Algorithm3

164 30 29 28 27 26 25 24 23 22 ae entities named getValidGroundings besatsaes s state start able retrieved the use candidate we sampled each assignments, for entity clusters named new conflicting create several to we be samples specifically, might More there As state. candidate configured round. entity a named each for for assignments clusters create entity first named valid derive we until budget expert (i.e. weights the restrict to have thus we outputs, ( their configurations weighted of has possible number learner all semantic grounded to a corresponds and resulting candi- executions cuted The decision NED steps. conflicting of of number number propose n increasing the and an for disagree with bound severe often more upper experts becomes NED problem The or dates. NER of number a ,M 9.4), Definition (see latter the of initializeGroundedPlanning properties eral MDPs) 9.4 Definition 9.3. Definition do extends which which entities NERD, named to tokens from mapping a have we overlap. not i.e. reached, is annotation overlap valid which candidates the (i.e. candidates BGP H = + aigtesto candidate of detail: in set algorithm the the having explain now 6, We algorithm in summarized and formalized is procedure The For a propose We for SEP Grounded resulting the define we algorithm, sampling the presenting Before nadto orsrcin,w antsml n igetaetr otega tt,as state, goal the to trajectory single a find simply cannot we restrictions, to addition In sepnlp:ParameterConfig; sepnlp:Token. sepkb:token2. rdf:type ?parameterConfig sepnlp:disjunct { OPTIONAL rdf:type sepkb:token1 sepnlp:token2 1 1 . ,..., initializeGroundedPlanning . rl o E xetadeepaysaeconfiguration. state exemplary and expert NED for rule LD-Fu 9.17: Listing w M MDP infinite-horizon Given e | ( tokens d s ) s ftegone eatclanraddasapeendnme fsamples of number predefined a draws and learner semantic grounded the of CAND sampling RDF 1 ( C Etne euto fGone Est Infinite-horizon to SEPs Grounded of Reduction (Extended s EXP ) ere btatpolicy abstract learned , sinput. as enpprmtr?parameter; sepnlp:parameterValue sepnlp:parameter | eprshvnbe exe- been haven experts NER semantic several After text. state a of ) hr h eutn tt pc ssmldwith sampled is space state resulting the where , ) eieaieycutralnmdett candidates entity named all cluster iteratively We SEPs). of prahwihgnrtspoaiiydsrbtosbsdo expert on based distributions probability generates which approach S CONF . h ae niyRcgiin&-iabgainScenario -Disambiguation & Recognition Entity Named The 9.3 ewn h ruddsmni lne obuild. to planner semantic grounded the want we ) ( s RDF 1 , euse we , π ABSTR ?parameterValue. SEP GRND d s , in E getValidGroundingsExtended π , d s ABSTR s .Tesmln rcs otne ni a until continues process sampling The ). s BGP H n h gen- the and 9.1) Definition (see odcd fw osdri o sampling a for it consider we if decide to + 1 ) xetstEadaalbegoal available and E set expert , SEP GRND d ihteavail- the with 4) (Algorithm s lsisdpnetnmdentity named dependent its plus sisatae ycalling by instantiated is sample n gas(with -grams ( S h nta of instead ) . d s 165 ∈ s

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Algorithm 6 sample(sCAND) 1: clusters ← /0 2: for all ds ∈ sCAND do 3: clusters ← clusters ∪ (ds,¬ds) 4: end for 5: merge ← true 6: while merge do 7: for all cluster ∈ clusters do 8: PCLUSTER ← getProbabilityDistribution(cluster) 9: cluster ← draw(PCLUSTER) 10: end for 11: merge ← checkValidity(clusters) // false if assignment is valid 12: if merge then 13: sDEP = invalidOverlaps(clusters) // increasingly ordered by size 14: clusters ← /0 15: for all ds ∈ sDEP do 16: if ¬inCluster(ds,clusters)∧∀cs ∈ ds.DEPS : ¬inCluster(cs,clusters) then 17: clusters = clusters ∪ (ds ∪ ds.DEPS) 18: else 19: if ¬inCluster(ds,clusters) then 20: cluster = getCluster(ds,clusters) 21: cluster = cluster ∪ ds 22: end if 23: end if 24: end for 25: end if 26: end while 27: s ← transformToState(clusters) 28: return s

166 l eiincniae ni) enwdsustecmlxt ftealgorithm. the of complexity the discuss now We it). in candidates decision all ln rcs hnntrsligoelp fdcso addts ..when e.g. candidates, decision of overlaps ... resolving not when process pling h loih ae noacutalpsil tt ofiuain,a n isstesam- the biases one as configurations, state possible all account into takes algorithm The ↔ • • • • • • • • d ipygteigaldcso addtsfo l vial lses h configured The clusters. available all returned. from then candidates is state decision all gathering simply 27-28: Line prior yet. a assigned in not clusters dependency are novel they form as to or if to assigned candidate rounds added subsequent already be decision in then not will current candidates was the decision dependent former add The the iteration. we if Otherwise, cluster, available cluster. an novel (i.e. a conflicts specifically, its More constitute all clusters. they and novel candidate build to decision sets a link if conflict use and candidates decision 14-24: Line the de- of of subsets length of the made by ordered is otherwise latter as The pendencies, DEPS. set candidate-dependent decision set in where distribution, probability a constitutes and values lses where clusters, clusters. of flicts) no and 12-13: Line valid end are to clusters is all algorithm if false the returns of follow. method goal to The the has one). merging (as length size of cluster clusters as with well up as clusters two of candidates 11: Line cluster. each from didate 6-9: Line conflicts. constitute candidates decision their as merged, be to not. or candidate 5: decision Line the choosing for choice the depicts initially thus cluster single Each a contain should 2-3: candidate. clusters Line decision of cluster’s another set with the overlap within not must cluster which each candidate decision is, That state. sampled 1: Line − 1 s ↔ d h eg aibei ntaie stu.I eit fteaalbecutrhas cluster available the if depicts It true. as initialized is variable merge The h e fcutr siiilzd hr h atrwl osiueteeventual the constitute will latter the where initialized, is clusters of set The s ni ovldsto lsescudb on,w edt rwadcso can- decision a draw to need we found, be could clusters of set valid no Until lse scetdfrec eiincniae locnann t negation. its containing also candidate, decision each for created is cluster A h aiiyo lsesi hce ntrso eta vra fdecision of overlap textual of terms in checked is clusters of validity The od,i ilssmln igedcso addt fasnl lse (with cluster single a of candidate decision single a sampling yields it holds, fmrignest apn ene ogte l nai vras(..con- (i.e. overlaps invalid all gather to need we happen, to needs merging If fe eiiilzn h lsesst eieaetruhalcretychosen currently all through iterate we set, clusters the re-initializing After ial,w aet rnfr h lsesit h ofiue tt by state configured the into clusters the transform to have we Finally, d s ∈ s DEP s DEP . h ae niyRcgiin&-iabgainScenario -Disambiguation & Recognition Entity Named The 9.3 n stersligstwt l eiincniae fteavailable the of candidates decision all with set resulting the is fra example). an for 9.2 Analysis Complexity (see -gram n tr ik oalcnitn eiincandidates decision conflicting all to links store P gasmgtcuet elc ueosdcso candidates decision numerous neglect to cause might -grams CLUSTER : S → [ 0 , 1 ] asdcso addtst probability to candidates decision maps ( d s ∪ d s ∑ . DEPS s P CLUSTER ) r o e nacluster, a in yet not are ) ( S = 1 s s ↔ c = ) s ∈ 2 s 1. d ↔ s . DEPS 3 s 167 ↔

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

Complexity Analysis 9.2. The worst case happens due to decision candidate dependencies such that 1s ↔ ds, 2s ↔ ds,..., d−1s ↔ ds (caused, for example, by n-grams for textual de- cision candidates), as this might cause the while loop (lines 6-26) to sequentially conduct |D − 1| draws due to the ordering of the dependency sets (which can be done in O(|D|log|D|) with sorting algorithms such as merge- or heap sort). The increasing ordering, as mentioned before, is essential to take into account decision candidates with smaller amounts of tokens if n-grams are available. The merge loop additionally iterates through all decision candidates, pointing to their dependencies (lines 15-24), causing quadratic complexity. The computa- tional complexity of the algorithm is thus within O(|D| + |D|2) = O(|D|2).

9.3.3 Evaluation

We evaluate the grounded semantic planner by running the complete NERD SEP with abstract semantic planner as introduced in Section 9.2.2, grounded semantic learner as introduced in Section 9.2.3 and novel grounded semantic planner as introduced for the semantic NERD task in Section 9.3.2.

9.3.4 Setup

Kernels: We use the same setup in terms of kernels, densities and (semantic-) experts as described in our learning evaluation in Chapter7. The only restrictions are the execution budgets M1 = M2 = |E|, where a single state is to be generated after each step.

Experts: We use the same experts as in our learning evaluation in Chapter7.

Evaluation measures: We focus on outcome optimization of SEPs and, as lifted for surgical phase recognition (see Section 9.2.3), use the grounded semantic learner (see Section 9.2.2) we lifted based on Single-Avg (see Chapter7), which uses single-expert meta dependencies to calculate expert weights. Other than for the MEP andEP evaluation, we are now in the batch learning setting, where we either re-train the grounded semantic learner at each query. While this significantly increases computational complexity, the results might be improved compared to the pure online learning case. The approach is, again, referred to as Batch-Single-Avg.

9.3.5 Results

The results are summarized in Table 9.6. While the results for the batch learner are better than in the online case (see Table 7.6), the significant result of the evaluation is that the grounded semantic planner created grounded plans to correctly query the respective semantic experts.

168 mgthv ob dutd sextended as adjusted, be to have might LAPI respective expert. the underlying of the end, influence information this might needed annotations To all semantic capture not goal. might task’s terms description in the semantic experts semantic their available functionality, by More solved their accordingly. be of modeled theoretically be could to task have novel a experts if semantic specifically, as well as components meta generalizing semantic are for semantics few useful where be situations might with dealing This for or feedback. expert and available. semantic via postconditions a datapoint and of pre- a applicability generic the for model eligibility i.e. their components, learn learning semantic grounded to decision one NERD. having for of sequence constraint token the or as token such planners per parameters, meta task semantic goal-dependent for while capture experts, constraints to learners semantic respective need semantic from grounded inferred capture for automatically to descriptions be however, semantic need semantic can, The they grounded functioning. as both correct addition, (ii) planning, and for execution In semantic domain-dependent generalizable. abstract completely are tasks. and and modeled NLP planning learning we MDP to descriptions on its weight bound semantic based expert (ii) not is The generic and and planner a component dependencies semantic on meta abstract built meta our underlying is on learner the based semantic of approach grounded flexibility learning Our throughout the (i) developed (i) we description. on components semantic meta dependent semantic is the work reuse this and generalize to ability The Components Meta Semantic of Generalizability the On 9.4.1 different for impacts to reward experts different of semantic problem and the with components tasks. deal semantic meta then and semantic tasks of or domains generalizability new the discuss first We Discussion 9.4 lo sn h rmwr o oe ak sdpneto h olo h epciets,as task, respective the of goal the on dependent is tasks novel for framework the using Also, decision the shift and postconditions and pre- the of impact the relax could one end, this To Batch-Single-Avg Approach Eauto eut o h EDSEP. NERD the for results Evaluation 9.6: Table 0.51 F1 irpss’14 Microposts 0.74 Prec 0.4 Rec 0.4 F1 Spotlight 0.5 Prec 0.37 Rec . Discussion 9.4 169

9 Chapter 9 Applications & Evaluations of Semantic Expert Processes

9.4.2 On Domain-dependent Impacts of Rewards Grounded semantic learning entails to either learn R directly or approximate expert weights. We neglected the aspect that R might have different interpretations and impacts for differ- ent domains. While for NLP techniques applied to Web-based tasks choosing the highest estimated reward measured on past executions might not cause problems, it might have crit- ical and ethical consequences for medical scenarios. Provenance metadata might have to be extended with more specific criteria for decision-making, where manually defined decision policies potentially need to be followed based on strict medical guidelines. Hence, a static semantic expert pipeline might be preferred for all data points, as it is trusted by physicians. A related challenge is that institutions might differ in their perceptions of what conceptual provenance metadata is needed to ensure trustworthiness or that different guidelines are fol- lowed. Hence, reusing task outcomes published as Linked Data requires developing strategies to deal with uncertainties coming from missing information.

9.5 Summary

In this chapter, we presented applications and evaluations of SEPs in medical assistance as well as NLP. Based on our general Web automation architecture for medical assistance, we first dealt with the TPM use case, where we presented semantic experts as well as a abstract semantic planner and a grounded semantic planner to automatically discover them. Both semantic planners were developed as reduction of Abstract- or Grounded SEPs to infinite- horizon MDPs, which can be solved via standard MDP Planning algorithms. We evaluated the semantic experts in terms of time-efficiency with respect to the local execution of their wrapped functionality and were able to show that semantic expert concept is suitable for med- ical assistance experts. Our evaluations for semantic meta components additionally validated that we can automatically discover as well as compose expert pipelines for medical assistance. We presented further semantic experts for Surgical Phase Recognition and developed a grounded semantic learner, which encapsulates a batch version of methods presented in PartII of this thesis. It exploited similarity measures defined on activity triples as well as single expert meta dependencies. The evaluation of the latter verified that respective semantic Phase Recognition experts were correctly executed and that the grounded semantic learners achieved robust and partially superior results. We finally dealt with our NERD application, where we first presented semantic experts for NER and NED. While we could reuse the abstract semantic planner of the TPM scenario, we extended the grounded semantic planner with a state sampling technique as well as the ability to define pairwise bag constraints. The grounded semantic learner for NERD, similarly to Phase Recognition, wrapped a batch version of techniques proposed in PartII of this thesis. Our evaluation showed that the resulting Web architecture correctly executed all semantic experts and semantic meta components, and achieved good outcomes for the Grounded NERD SEP.

170 cnb confirmed. be answered can 5 learning Hypothesis semantic corresponding the grounded abstract where of and 5, terms planning Question in evaluations Research semantic their grounded and components planning, meta semantic semantic developed the as well as confirmed. correspond- be the can where 4, 4 Question Hypothesis Research ing answered time-efficiency their evaluating as well NERD, and assistance medical for architectures Web resulting the on work our Similarly, as NERD and Recognition Phase Surgical TPM , to experts semantic applying on work Our . Summary 9.5 171

9

Part IV

Conclusion

This part concludes the thesis by first summarizing the presented contents in terms of research questions, hypotheses and contributions (Chapter 10). We finally point out future research directions for each of the stipulated hypotheses in Chapter 11.

5) with (Chapter tasks experts multi-step choosing for by tions and solution, task the correct to the respect generating support to ability expert’s problem. learning the We 1. to contributions. Hypothesis respect respective with our hypotheses highlighting our with and start questions, research corresponding and ses In Summary 10 Chapter 7) & for 6 solutions (Chapter correct flexible of system resulting number the the keeping improve while can tasks latter multi-step the solving confirm and fully problem coordination to approaches learning 3. all de- Hypothesis 3. current for Hypothesis the evaluations introducing of after our 2 neighborhoods rep- describe Hypothesis diverse feature We in measures novel expert expert an candidate. a relational of of exploit cision performance We consists relative which LGM . the dependencies), complex quantify scalable, meta to a easily as on and (mini-) to relies lightweight (referred the approach is resentation and method RL online RL Batch the Online (mini-) our in we our While where candidates respectively. decision approaches, setting, for RL batch weights model-free two expert comprise approximated contributions directly Our reward. cumulative with tasks multi-step for solutions 7) correct & of 5 number (Chapter the advice maximize expert to order be in only budgets 2. expert Hypothesis can in 1 target Hypothesis we 2. which Hypothesis methods, challenges. learning learning adequate adequate relevant without (and confirmed all steps partially By over model steps. reward future we cumulative on expected rewards), impact the positive engineered a maximizes stipulating have which to MDPs, policy has for a candidate equations decision learning a Bellman in for well-known weights weight the expert expert an to of that property defined target analogously The be constraints. can budget EPs and extend- framework advice theoretic expert decision with a MDPs EPs, ing introduced We advice. expert budgeted with tasks daswt h eta rbe fepr orlto nmlise tasks, multi-step in correlation expert of problem central the with deals 3 Question Research expected the maximize to EPs for methods learning targeted we , 2 Question Research In multi-step solving for formalization adequate an finding with dealt 1 Question Research hschapter this exploration-exploitation uigrltoa etrsfridvda akisacsealst elwith deal to enables instances task individual for features relational using RL rmn xetslcinad-obnto o ut-tptssa Multiagent as tasks multi-step for -combination and selection expert Framing esmaietecnet fti hssb eiiigalsiuae hypothe- stipulated all revisiting by thesis this of contents the summarize we , ylann xetwihsfridvda akisacs hc xrs an express which instances, task individual for weights expert learning By . . rbe,w a aiietenme fcretsolu- correct of number the maximize can we problem, . 175

10 Chapter 10 Summary which potentially introduces biases to learning methods forEPs. We extendedEPs with mul- tiple agents, where each agent manages exactly one expert in MEPs. The latter enable to treat the correlation problem as expert coordination. To this end, we explored both the lightweight independent learner- as well as rich joint-action learner protocol and developed respective methods. We evaluated our learning methods forEPs as well as MEPs for the NERD task and showed that, for both tweets and news articles, our methods were able to alternately dominate all base- lines, which consisted of – among others – the monolithic systems as deployed on the Web. We were thus able to confirm Hypothesis2&3, and in retrospect also confirm Hypothesis1 concerning our general-purpose framework. We now deal with our hypotheses, research questions and resulting contributions for the prob- lem of Web automation of multi-step tasks. Hypothesis4. Given a multi-step task with available experts, structuring experts and task in- stances with lightweight annotations modeled with the Resource Description Language (RDF) as well as exploiting HTTP methods to execute the services is sufficient to find and execute expert services (Chapter8 & Chapter9) . Associated Research Question4 focused on the problem of heterogeneity in expert func- tionalities, implementations and interfaces available on the Web. We proposed lifting experts to semantic experts, which are LAPIs able to provide access via HTTP methods, introspection via machine-readable descriptions and provenance via execution-specific groundings of the latter. We applied the concept of semantic experts to NERD- as well as medical assistance experts and showed that no significant time overhead is generated. The latter are thus eligible for automating multi-step tasks on the Web, which confirms Hypothesis4. Hypothesis5. Expert services lifted with concepts of the Semantic Web can be executed on a data-driven basis, where decision-theoretic models enable planning their execution as well as integrating expert weight learning approaches to solve multi-step tasks (Chapter8& Chapter9) . Research Question5 deals with enabling to efficiently find suitable experts for a step and automatically execute the latter without human intervention. Our contribution did comprise a data-driven approach for semantic expert execution based on a declarative rule-based engine LD-Fu. We extendedEPs with semantic concepts – yielding SEPs – to enable the reuse of MDP planning techniques for semantic expert discovery and the integration of expert weight learning methods as proposed throughout this thesis. SEPs with respective semantic meta components have been applied to both the NERD- and the two medical assistance tasks and evaluated in terms of running exemplary SEPs. Hypothesis5 was thus confirmed.

176 miia vlaino h atri motn n rtse,tertclfaeok uhas such frameworks theoretical step, first our a While and Correct algorithms. important Approximately RL is Probably of latter performance the the of quantify evaluation to empirical means central the is convergence as adequate with algorithms RL are pure dependencies) using promising. meta consequence, is proposed representations a feature the As generalization. (e.g. on enable representations and, to sets feature needed data elaborate reference hand, hand-labeled where, small other Value-based relatively text) the for as as converge (such (such not RL data might Deep [115]) unstructured TRPO pure with dealt hand, proposed one we the methods on The EP s. for Value-based [133]) as such algorithms RL pure research. evaluate of line extending an extension. open EP thus the and for methods reward RL cumulative from expected differ the significantly maximize methods to agent IRL and subset order important the in function the take showing extract to reward to By decisions learn the of could latter. former recover the the combinations, to hypothesis engineer and tries manually executions one to avoids where elegantly direction, thus fruitful and innovative to how most clear not is finer-grained it as with EP evaluated, an extended empirically such be behavior. one and for convergent to functions engineered achieve where Reward have carefully steps spaces. be EP s action to and as and have points) trivial, state extension re-model from data to far (or entails overall is which candidates to extension decisions lead decision This either to confident can unsure. distributed budget is is saved be one The where or experts. steps, savings respective and cost the points) of save data performance could (or the one candidates that about is decision assumption applicability for underlying real-world executions The to expert flexible. regards more combine with constraints and direction budget choose making important to is how An learn hypotheses. to their algorithms RL and develop experts and extend reuse, to enable which thesis. this chapter This Work Future 11 Chapter oti n,i scuilt td ovrec eair fmtospooe nti thesis, this in proposed methods of behaviors convergence study to crucial is it end, this To ( EPs for methods learning for work future of line possible A oti n,wrigtwrsusing towards working end, this To ( EPs of framework the to respect with work future discussing by begin We onsotftr okwt epc otepooe rmwrsadmtosof methods and frameworks proposed the to respect with work future out points and 132 ] [77, (PAC) nes enocmn Learning Reinforcement Inverse Q Lann rAtrCii loihs(e.g. algorithms Actor-Critic or -Learning epQ-Learning Deep nw htI Knows It What Knows 2 Hypothesis rActor-Critic or [93] ih ethe be might (IRL) 1 Hypothesis optimal opie to comprises ) [ 87] (KWIK) expert 177 ),

11 Chapter 11 Future Work fundamentally approach the study of convergence by proving respective guarantees. Future works concerning the reduction of expert correlation for multi-step tasks by Multia- gent coordination (Hypothesis3 ) might lead towards the study of distributed systems, where responsible agents cannot be centralized because respective experts are owned by indepen- dent parties, or the system needs to scale to very large amounts of experts. The former reason is of interest on its own, as real-world scenarios often comprise independent stakeholders with mostly cooperative intentions but individual power. A prominent example is medical assistance, where experts might represent models trained from data of different institutions or physicians themselves. Here, such coordination protocols and rewards paired with adequate incentives might enable to apply MEPs in medical settings. To achieve complete decentralization, one has to optimize communication among agents. The resulting constrained communication problem entails to deal with communication actions, as we already defined for MEPs. Recent advances in costly feature selection [47] based on bandit algorithms can be reused and extended, where one would model all agent combinations of the given communication budget length as possible communication actions. An alternative direction is using deep neural networks, which have been successfully applied to learning how to communicate [46]. Taking the opposite perspective for future works, a promising line of research deals with using Multiagent coordination for centralized decision-making [82, 134] which has lead to good results for game playing (which currently is the main proof-of-concept for novelRL algorithms). By adding a superior meta-agent, collecting the decisions of the coordination agents and taking global decisions, one could improve the overall performance for MEPs to solve multi-step tasks. We next discuss future works for lifting experts to semantic experts (Hypothesis4 ), leading to knowledge integration for multi-step tasks. A core assumption of the lifting process was the manual annotation and -wrapping process of expert services (or respective local implementa- tions) to semantic experts. Here, building on advances in the automatic deployment of Docker containers for semantic experts is an essential next step, as this leads to better scalability as well as availability. Besides easing and automating the wrapping process of experts, the de- velopment of suggestion mechanisms with regards to manually creating semantic descriptions (and most importantly pre- and postconditions) based on supervised learning is important. Such methods can only be developed by extending prominent service provider platforms, such as Algorithmia, with semantic annotations for experts and providing real-world incentives for end users to build the manually annotated ground truth. To this end, we also require semantic annotations to match perfectly, i.e. the same vocab- ularies have to be used for all semantic experts to enable exchangeability as well as pre- and postconditions matchings. There are future directions one might take to add flexibility. A first step towards learning semantic matchings would be to assume simple typed annotations with given upper classes, e.g. spc:Image, to restrict the search space. One might then approach to learn supervised models based on the eventual positive or negative outcomes of a task when using a specific expert with uncertainty in its annotations.

178 nbigsc itnsi eeca.Hwvr hsete eurst epasaeo store or state a keep to requires either this However, flexibility, beneficial. learning of weight degrees is expert higher liftings yields proposed which such our learning of enabling online part supports extensive 5 ) Chapter an (see As techniques learners. grounded to semantic regards to latter. with the uncertainties of model availability or to conditions enables expert semantic also or with of s) LOMDP conditions function matchings in transition expert imperfect stochastic practiced semantic defining (as As in end, this rules variables To s SEP logical representations. of size. for in tabular number algorithms increase to the groundings learning state lowered when and start be planning useful efficient to especially more have are exploit not which could do grounded one as SEPs well experts, consequence, of as of a abstract spaces liftings where action semantic MDPs, exploit and relational better of state- To advantages potential matchings. study postcondition could one and pre- of terms in impacts. detrimental have decisions involved combination the or when hypotheses hypotheses sensitive of their se- and choice comprise experts and might semantic latter storing and The accessing important. of are curity decision-making) medical in available example, sustainable and fair yield but must central, coordination problems consensuses. and still Numerous exhaustive is be capabilities. cannot LD-Fu) coordination communication The and tool as communication arise, the agent. with underlying semantic enriched concepts the be by to by re-design needs (embodied execution fundamentally data-driven latter fruitful to the and the has novel as behind one a well idea SEPs, is as Multiagent users components towards end meta de- working serve semantic By to The others research. with applications. of communicate (IoT) line as well Things as of coordinate Internet and and Web- for of velopment essential are which systems, ( problems learning con- essential. sufficient are if outcomes determining these and trust Linked to results as available of (published is reproducibility information institutions guaranteeing textual other end, from this available, To metadata is data generated Data). training examine ex- no to One when experts requires challenges. semantic which research selecting and and discovering opportunities be further might open ample might as and (such beneficial ontologies sophisticated is established OPMW) reusing executions, component meta semantic as developing manually for intrinsic points the starting best learn understand the better are. and where enables experts This and MDP semantic platform. needed a end specific is a model studying automation on Web users By can where end . IRL of one class of platforms, the of means provide function reward by service experts on semantic behavior of user liftings and developments oriented eas nydatwt itn ac erigapproaches learning batch lifting with dealt only also we SEPs in learning to regards With namely degree, small a to semantics leveraged only we planning abstract of case in Besides, for (as, constraints real-world further concerning works future SEPs, distributing Besides and planning diverse for components meta semantic proposed and SEPs for works Future well as expert semantic on based metadata provenance generate already we while Finally, goal- focused, enabling with deals mention to want we direction work future additional An efgvre components self-governed 5 Hypothesis odistributed to SEPs extending with concerned be might ) hc uooosyageaeternee inputs needed their aggregate autonomously which , 179

11 Chapter 11 Future Work learned expert weights to gradually update the latter. Finally, as we mentioned critical reward impacts for sensitive domains such as medicine, further qualitative and quantitative evaluations are needed to work towards richer provenance models and thus higher end user acceptance of automatically pipelined workflows, e.g. end user questionnaires to quantify trust thresholds for suggested task solution To summarize, while this thesis contributes to decision-making for multi-step tasks with expert advice, it also sets the ground for diverse, fruitful research directions.

180 Bibliography Ade .BroadSihrMhdvn eetavne nheacia enocmn learn- reinforcement hierarchical in advances Recent Mahadevan. Sridhar and Barto G. Andrew [9] fields: random markov Hinge-loss Getoor. Lise and London, Ben Huang, Bert Bach, H. Stephen [8] MPE Scaling O’Leary. P. Dianne and Getoor, Lise Broecheler, Matthias Bach, nonstochas- H. The Stephen [7] Schapire. E. Robert and Freund, Yoav Cesa-Bianchi, Nicolò Auer, Peter [6] processes. flexible in composition service Adaptive Pernici. Barbara and Ardagna Danilo [5] Andreas Hahn, Tobias Gengenbach, Thomas Baumann, Martin Augustin, Werner Anzt, Hartwig [4] with prediction Budgeted Turaga. S. Deepak and Tesauro, Gerald Kale, Satyen Amin, Kareem [3] portable A Makeflow: Thain. Douglas and Bui, Peter Donnelly, Patrick Albrecht, Michael [2] Mon- Marco Mello, Paola Lamma, Evelina Gavanelli, Marco Chesani, Federico Alberti, Marco [1] S0097539701398375. ing. USA WA, Bellevue, 2013, UAI Intelligence, Artificial In in Uncertainty prediction. structured for inference Convex Neural Lake on 2012, Conference 3-6, December Annual held USA In 26th meeting Nevada, Tahoe, a of 25: optimization. Proceedings 2012. Systems consensus Systems Processing Processing with Information Information fields Neural random in markov Advances continuous constrained for inference problem. bandit multiarmed tic Eng. Software Trans. Per- High for Tools Parallel on Workshop 2011 International September 5th 978-3-642-31476-6_12. Dresden, the ZIH, of Computing, In Proceedings formance Hiflow Chan- - package. Schmidtobreick, 2011 Wlotzka. element Martin ing Mareike finite and parallel Wilhelm, Schick, hardware-aware Florian A Michael Weiss, Jan-Philipp Ronnas, Subramanian, dramowli Staffan Nestler, Ritterbusch, Andreas Lukarski, Sebastian Dimitar Ketelaer, Eva Heuveline, Vincent Helfrich-Schkarbanenko, USA Texas, Austin, In advice. expert In Technologies, and grids. Engines Execution and Workflow USA Scalable clouds, AZ, on Scottsdale, clusters, Workshop SWEET@SIGMOD, SIGMOD on ACM computing 1st intensive the data for abstraction Semantic Italian In 3rd the interaction. Italy of Pisa, service Proceedings Superiore, Perspectives, web Normale and Scuola smart Workshop, Applications Web for Web reasoning Semantic Policy-based - 2006 Torroni. Paolo and tali, iceeEetDnmcSystems Dynamic Event Discrete rceig fteTet-it AICneec nAtfiilIntelligence, Artificial on Conference AAAI Twenty-Ninth the of Proceedings ae 4029,2015. 2490–2496, pages , ae 6327,2012. 2663–2671, pages , 36:6–8,20.di 10.1109/TSE.2007.1011. doi: 2007. 33(6):369–384, , IMJ Comput. J. SIAM 312:17,20.di 10.1023/A:1022140919877. doi: 2003. 13(1-2):41–77, , WE 1,pgs1111.AM 2012. ACM, 1:1–1:13. pages '12, SWEET , rceig fteTet-it ofrneon Conference Twenty-Ninth the of Proceedings 2006. , 21:87,20.di 10.1137/ doi: 2002. 32(1):48–77, , ae 3–5,21.di 10.1007/ doi: 2011. 139–151, pages , ol o ihPromneComput- Performance High for Tools 2013. , rceig of Proceedings SWAP IEEE 181 3 :

12 Bibliography

[10] Amparo Elizabeth Cano Basave, Giuseppe Rizzo, Andrea Varga, Matthew Rowe, Milan Stankovic, and Aba-Sah Dadzie. Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In Proceedings of the 4th Workshop on Making Sense of Mi- croposts co-located with the 23rd International World Wide Web Conference (WWW'14), Seoul, Korea, pages 54–60, 2014.

[11] Richard Bellman. Dynamic programming. Princeton University Press, 1957.

[12] V. Richard Benjamins and Christine Pierret-Golbreich. Assumptions of problem-solving meth- ods. In Advances in Knowledge Acquisition, 9th European Knowledge Acquisition Workshop, co-located with the International Conference on Knowledge Engineering and Knowledge Man- agement, EKAW'96, Nottingham, UK, pages 1–16, 1996.

[13] Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Com- mun. ACM, 18(9):509–517, 1975. doi: 10.1145/361002.361007.

[14] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American, 284 (5):34–43, 2001.

[15] Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. In Proceedings of the Fourteenth Inter- national Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, pages 19–26, 2011.

[16] Alina Beygelzimer, Anton Riabov, Daby Sow, Deepak S. Turaga, and Octavian Udrea. Big data exploration via automated orchestration of analytic workflows. In 10th International Conference on Autonomic Computing, ICAC'13, San Jose, CA, USA, pages 153–158, San Jose, CA, 2013. ISBN 978-1-931971-02-7.

[17] Alina Beygelzimer, Satyen Kale, and Haipeng Luo. Optimal and adaptive algorithms for online boosting. In IJCAI, pages 4120–4124, 2016.

[18] Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009. doi: 10.4018/jswis.2009081901.

[19] Daniel Blankenberg, Gregory Von Kuster, Nathaniel Coraor, Guruprasad Ananda, Ross Lazarus, Mary Mangan, Anton Nekrutenko, and James Taylor. Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology, 2010. doi: 10.1002/0471142727. mb1910s89.

[20] Craig Boutilier. Sequential optimality and coordination in multiagent systems. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, IJCAI 99, Stockholm, Sweden, pages 478–485, 1999.

[21] Ronen I. Brafman and Moshe Tennenholtz. R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213–231, 2002. doi: 10.1162/153244303765208377.

[22] Pavel Brazdil, Christophe Giraud-Carrier, Carlos Soares, and Ricardo Vilalta. Metalearning: Applications to Data Mining. Springer, 1 edition, 2008. ISBN 3540732624, 9783540732624.

182 Ei hitne,FacsoCrea rgMrdt,adSniaWeaaaa e services Web Weerawarana. Sanjiva and Meredith, Greg Curbera, Francisco Christensen, Erik [28] Schapire, E. Robert Helmbold, P. David Haussler, David Freund, Yoav Cesa-Bianchi, Nicolò [27] context- Probabilistic, Shan. Ming-Chien and Dayal, Umeshwar Castellanos, Malú Casati, Fabio [26] co- of learning reinforcement joint-action for Baselines Kudenko. Daniel and Carpenter In Martin [25] logic. similarity Probabilistic Getoor. Lise and Mihalkova, Lilyana Bröcheler, Matthias [24] forests. Random Breiman. Leo [23] XnLn og vei arlvc,Grm et,WloHr,KvnMrh,Shaohua Murphy, Kevin Horn, Wilko Heitz, Geremy Gabrilovich, Evgeniy function Dong, value Luna MAXQ Xin the [34] with learning reinforcement Hierarchical Dietterich. G. Thomas [33] data-efficient and model-based A PILCO: Rasmussen. Edward Carl and Deisenroth Peter Marc [32] Rajiv Maechling, Philip Callaghan, Scott Rynge, Mats Juve, Gideon Vahi, Karan Deelman, Ewa [31] relational Multi-agent Bruynooghe. Maurice and Ramon, Jan Tuyls, Karl Croonenborghs, Tom [30] cooperative in learning reinforcement of dynamics The Boutilier. Craig and Claus Caroline [29] 1010933404324. ecito agae(SL ..WCrcmedto,WC 01 URL 2001. W3C, recommendation, W3C w3.org/TR/wsdl 1.1. (WSDL) language description advice. expert use to How 10.1145/258128.258179. Warmuth. K. Manfred and USA NY, York, 1035167.1035213. New Conference, In International selection. Second service goal-oriented and sensitive, Learning Multi-Agent and In Adaptation systems. multi-agent cooperative in ordination Intelligence, Artificial USA in CA, Uncertainty Island, on Conference Catalina Twenty-Sixth the of Proceedings 2010, UAI u,adWiZag rmdt uint nweg fusion. knowledge to fusion data 10.14778/2732951.2732962. From doi: Zhang. Wei and Sun, decomposition. USA Washington, Bellevue, In 2011, ICML Learning, search. policy to approach Pegasus, Wenger. 10.1016/j.future.2014.10.008. Kent automation. doi: R. science and 2015. for Livny, 17–35, system Miron Silva, management workflow da a Ferreira Rafael Chen, Weiwei Mayani, Papers Selected Revised Netherlands, The Utrecht, LAMAS, Workshop, In learning. reinforcement IAAI 98, AAAI Conference, Intelligence USA Artificial Wisconsin, of Madison, Applications 98, Innovative Tenth and gence In systems. multiagent .Atf nel Res. Intell. Artif. J. cesdArl1,2015. 14, April accessed . rceig fteFfenhNtoa ofrneo rica Intelli- Artificial on Conference National Fifteenth the of Proceedings ae 38,2010. 73–82, pages , erigadAato nMliAetSses is International First Systems, Multi-Agent in Adaption and Learning ae 4–5,1998. 746–752, pages , rceig fte2t nentoa ofrneo Machine on Conference International 28th the of Proceedings 32733 00 o:10.1613/jair.639. doi: 2000. 13:227–303, , ahn Learning Machine ae 57,20.di 10.1007/978-3-540-32274-0_4. doi: 2005. 55–72, pages , dpieAet n ut-gn ytm II: Systems Multi-Agent and Agents Adaptive evc-retdCmuig-ISC2004, ICSOC - Computing Service-Oriented ae 6–7.Onpes 2011. Omnipress, 465–472. pages , 51:–2 01 o:10.1023/A: doi: 2001. 45(1):5–32, , ae 1–2,20.di 10.1145/ doi: 2004. 316–321, pages , .ACM J. uueGnrto op Syst. Comp. Generation Future PVLDB 43:2–8,19.doi: 1997. 44(3):427–485, , ae 9–0,2005. 192–206, pages , (0:8–9,2014. 7(10):881–892, , http://www. Bibliography 46: , 183

12 Bibliography

[35] Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lu- garesi, Shaohua Sun, and Wei Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. PVLDB, 8(9):938–949, 2015. doi: 10.14778/2777598.2777603.

[36] Prashant Doshi, Richard Goodwin, Rama Akkiraju, and Kunal Verma. Dynamic workflow com- position: Using markov decision processes. Int. J. Web Service Res., 2(1):1–17, 2005. doi: 10.4018/jwsr.2005010101.

[37] Kurt Driessens and Jan Ramon. Relational instance based regression for relational reinforcement learning. In Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pages 123–130, 2003.

[38] Saso Dzeroski, Luc De Raedt, and Kurt Driessens. Relational reinforcement learning. Machine Learning, 43(1/2):7–52, 2001. doi: 10.1023/A:1007694015589.

[39] Thomas Fahringer, Radu Prodan, Rubing Duan, Jüurgen Hofer, Farrukh Nadeem, Francesco Nerieri, Stefan Podlipnig, Jun Qin, Mumtaz Siddiqui, Hong-Linh Truong, Alex Villazon, and Marek Wieczorek. ASKALON: A Development and Grid Computing Environment for Scientific Workflows, pages 450–471. Springer London, 2007.

[40] Dieter Fensel, Enrico Motta, Frank van Harmelen, V. Richard Benjamins, Monica Crubézy, Stefan Decker, Mauro Gaspari, Rix Groenboom, William E. Grosso, Mark A. Musen, En- ric Plaza, Guus Schreiber, Rudi Studer, and Bob J. Wielinga. The unified problem-solving method development language UPML. Knowl. Inf. Syst., 5(1):83–131, 2003. doi: 10.1007/ s10115-002-0074-5.

[41] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. In Advances in Neu- ral Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, Quebec, Canada, pages 2962–2970, 2015.

[42] Roy Thomas Fielding. Architectural Styles and the Design of Network-based Software Archi- tectures. University of California, Irvine, 2000.

[43] Richard Fikes and Nils J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artif. Intell., 2(3/4):189–208, 1971. doi: 10.1016/0004-3702(71) 90010-5.

[44] Rosa Filguiera, Iraklis Klampanos, Amrey Krause, Mario David, Alexander Moreno, and Mal- colm Atkinson. dispel4py: A python framework for data-intensive scientific computing. In Proceedings of the 2014 International Workshop on Data Intensive Scalable Computing Sys- tems, DISCS '14, pages 9–16. IEEE Press, 2014.

[45] Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. Incorporating non-local in- formation into information extraction systems by gibbs sampling. In ACL 2005, 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, Uni- versity of Michigan, USA, 2005.

184 JkbN ortr ansM sal ad eFets n hmnWieo.Lann to Learning Whiteson. Shimon and Freitas, de Nando Assael, M. Yannis Foerster, N. Jakob [46] JwiHag igCe,Cun i,adJnin hn akn e evcswt limited with services web Ranking Chen. Junliang and Lin, Chuang Chen, Ying Huang, Jiwei [57] Pinkal, Manfred Fürstenau, Hagen Bordino, Ilaria Yosef, Amir Mohamed techniques. Hoffart, and Systems Johannes planning: [56] AI Drummond. Mark and Tate, Austin Hendler, composition. A. service James cloud [55] Agent-based Sim. Mong Kwang and Gutiérrez-García solution Octavio Efficient J. Venkataraman. [54] Shobha and Parr, Ronald Koller, Daphne Guestrin, Carlos [53] A Ratnakar. Varun and Moody, Joshua Kim, Jihie Gonzalez-Calero, A. Pedro Gil, Yolanda [52] Weber, Christian Götz, Michael Philipp, Patrick Maleshkova, Maria Gemmeke, Philipp [51] stan- abstractions, workflows: publishing for approach new A Gil. Yolanda and Garijo Daniel [50] machine. boosting gradient A approximation: function Greedy Friedman. H. Jerome [49] and learning on-line of generalization decision-theoretic A Schapire. E Robert and Freund Yoav [48] In regression. linear sparse Online Karloff. J. Howard and Kale, Satyen Foster, P. Dean [47] rcsigSses2:Ana ofrneo erlIfrainPoesn ytm '16, Systems Processing Information Neural on Conference Spain Annual Barcelona, 29: Systems In Processing learning. reinforcement multi-agent deep with communicate Statistics nhrg,A,USA AK, Anchorage, In information. noisy and Group Interest Special a ACL SIGDAT, of the meeting of A UK, Edinburgh, EMNLP, of Processing, disambiguation Language Robust In Weikum. text. in Gerhard entities and named Thater, Stefan Taneva, Bilyana Spaniol, Marc Magazine AI Intell. Appl. MDPs. factored for algorithms 10.1080/0952813X.2010.490962. doi: 2011. 389–467, data distributed catalogues. using component workflows and computational of generation automatic for framework semantic 13th Italy the Garda, del with Riva co-located 2014), In 2014) linked (ISWC Conference (COLD Web images. Using Data Semantic medical International Linked of Consuming Rettinger. pre-processing on Workshop the Achim International automating 5th and for apis Maier-Hein, web Klaus and data Nolden, Marco Kämpgen, Benedikt USA WA, Seattle, SC11, , with co-located Science, Large-Scale of In data. linked and dards, 0022-0000. ISSN boosting. to application an USA York, New 2016, COLT Theory, Learning on 2016. Conference 29th the of ings 918–22 2000. 29:1189–1232, , ae 8–9,2011. 782–792, pages , 83:3–6,21.di 10.1007/s10489-012-0380-x. doi: 2013. 38(3):436–464, , 12:17,1990. 11(2):61–77, , ae 1724,2016. 2137–2145, pages , ae 3–4,21.di 10.1109/ICWS.2014.94. doi: 2014. 638–645, pages , rceig fte21 ofrneo miia ehd nNatural in Methods Empirical on Conference 2011 the of Proceedings OK'1 rceig fte6hWrso nWrflw nSupport in Workflows on Workshop 6th the of Proceedings WORKS'11, 04IE nentoa ofrneo e evcs CS 2014, ICWS, Services, Web on Conference International IEEE 2014 ora fEprmna hoeia rica Intelligence Artificial Theoretical & Experimental of Journal ora fCmue n ytmSciences System and Computer of Journal .Atf nel Res. Intell. Artif. J. 93948 03 o:10.1613/jair.1000. doi: 2003. 19:399–468, , dacsi erlInformation Neural in Advances ae 75,2011. 47–56, pages , 51:1 3,1997. 139, – 55(1):119 , ae 53,2014. 25–36, pages , rceig fthe of Proceedings ae 960–970, pages , Bibliography nasof Annals Proceed- 23(4): , 185

12 Bibliography

[58] Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. Adaptive mixtures of local experts. Neural computation, 3(1):79–87, 1991. doi: 10.1162/neco.1991.3.1.79.

[59] Anubhav Jain, Shyue Ping Ong, Wei Chen, Bharat Medasani, Xiaohui Qu, Michael Kocher, Miriam Brafman, Guido Petretto, Gian-Marco Rignanese, Geoffroy Hautier, Daniel K. Gunter, and Kristin A. Persson. Fireworks: a dynamic workflow system designed for high-throughput applications. Concurrency and Computation: Practice and Experience, 27(17):5037–5059, 2015. doi: 10.1002/cpe.3505.

[60] Joseph P. Joyce and George W. Lapinsky. A history and overview of the safety parameter display system concept. Nuclear Science, IEEE Transactions on, 30(1):744–749, Feb 1983. ISSN 0018- 9499.

[61] Spiros Kapetanakis and Daniel Kudenko. Reinforcement learning of coordination in coopera- tive multi-agent systems. In Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, Edmonton, Alberta, Canada, pages 326–331, 2002.

[62] Spiros Kapetanakis, Daniel Kudenko, and Malcolm J. A. Strens. Learning to coordinate using commitment sequences in cooperative multiagent systems. In Adaptive Agents and Multi-Agent Systems II: Adaptation and Multi-Agent Learning, pages 106–118, 2005. doi: 10.1007/978-3-540-32274-0_7.

[63] D. Katic, A. L. Wekerle, F. Gärtner, H. G. Kenngott, B. P. Müller-Stich, R. Dillmann, and S. Speidel. Knowledge-driven formalization of laparoscopic surgeries for rule-based intraoper- ative context-aware assistance. In Information Processing in Computer-Assisted Interventions - 5th International Conference, IPCAI'14, Fukuoka, Japan, pages 158–167, 2014.

[64] Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polyno- mial time. Journal of Machine Learning Research, 49(2-3):209–232, 2002. doi: 10.1023/A: 1017984413808.

[65] Kristian Kersting and Luc De Raedt. Logical markov decision programs. In Working Notes of the International Joint Conference on Artificial Intelligence, IJCAI, Workshop on Learning Statistical Models from Relational Data (SRL'03), pages 63–70, 2003.

[66] Kristian Kersting, Martijn van Otterlo, and Luc De Raedt. Bellman goes relational. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML'04), Banff, Alberta, Canada, 2004. doi: 10.1145/1015330.1015401.

[67] Angelika Kimmig, Stephen Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. A short introduction to probabilistic soft logic. In Workshop on Probabilistic Programming: Founda- tions and Applications, co-located with Conference on Neural Information Processing Systems, pages 1–4, 2012.

[68] Angelika Kimmig, Lilyana Mihalkova, and Lise Getoor. Lifted graphical models: a survey. Machine Learning, 99(1):1–45, 2015. ISSN 1573-0565.

186 JckKpcý oa ivr aieBunz n olFrel ASL eatcanno- semantic SAWSDL: Farrell. Joel and Bournez, Carine Vitvar, Tomas Kopecký, Jacek [72] Friedman. Nir and Koller Daphne [71] coordination. service web Semantic Klusch. Matthine [70] composition service web Semantic Schmidt. Marcus and Gerber, Andreas Klusch, Matthias [69] Jh agod iogL,adTn hn.Sas nielann i rnae gradient. truncated via learning online Sparse Zhang. Tong and Li, Lihong Langford, John [81] Riedmiller. Martin and Gabel, Thomas Lange, Sascha [80] knowledge-based Automatic Jannin. Pierre and Riffaud, Laurent Bouget, David Lalys, Florent [79] In mediawiki. Semantic Völkel. Max and Vrandecic, Denny Krötzsch, Markus [78] with learning reinforcement PAC Langford. John and Agarwal, Alekh Krishnamurthy, Akshay [77] Wil- Dirk Gillen, Sonja Schneider, Armin Fiolka, Adam Staub, Christoph Kranzfelder, Michael [76] survey. A problems: search combinatorial for selection Algorithm Kotthoff. Lars [75] organ- agent Self-organising Jennings. R. Nicholas and Gibbins, Nicholas Kota, for Ramachandra microformat [74] HTML An hrests: Vitvar. Tomas and Gomadam, Karthik Kopecky, Jacek [73] ain o SLadXLschema. XML and WSDL for tations 978-0-262-01319-2. ISBN 2009. Press, MIT web semantic the in tion Web Semantic the and Agents on (AAAI) Intelligence Artificial of ment In owls-xplan. with planning 10.1109/MIC.2007.134. ora fMcieLann Research Learning Machine of Journal 10.1007/978-3-642-27645-3_2. doi: 978-3-642-27645-3. ISBN 2012. Springer, 45–73. Surgery and procedures. Radiology Assisted ophthalmological puter in tasks low-level of recognition USA GA, 10.1007/11926078_68. Athens, doi: ISWC'06, Conference, 2006. Web 935–942, Semantic International 5th ISWC'06, - Web Spain Barcelona, '16, Systems Processing Information Neural on ence In observations. rich the in autonomy increased Toward 0930-2794. expectations. Feussner. and Hubertus requests, needs, and OR: Knoll, surgical Alois Friess, Helmut helm, 2014. 35(3):48–60, 2 Volume 1558122. Hungary, Budapest, (AAMAS'09), In isations. Australia NSW, Sydney, WI'08, gence, In services. web restful describing t nentoa on ofrneo uooosAet n utaetSystems Multiagent and Agents Autonomous on Conference Joint International 8th dacsi erlIfrainPoesn ytm 9 nulConfer- Annual 29: Systems Processing Information Neural in Advances ae 914 08 o:10.1007/978-3-7643-8575-0_3. doi: 2008. 59–104, pages , nentoa alSmoimo h soito o h Advance- the for Association the of Symposium Fall International rbblsi rpia oes-Picpe n Techniques and Principles - Models Graphical Probabilistic EE/WC/AMItrainlCneec nWbIntelli- Web on Conference International ACM / WIC / IEEE oue1 ae 1–2.IE,2008. IEEE, 619–625. pages 1, volume , ()3–9 03 SN1861-6410. ISSN 2013. 8(1):39–49, , 07781 09 o:10.1145/1577069.1577097. doi: 2009. 10:777–801, , EEItre Computing Internet IEEE ae 9–0,20.di 10.1145/1558109. doi: 2009. 797–804, pages , ug Endoscopy Surg. ACM nelgn evc coordina- service Intelligent CASCOM: ac enocmn Learning Reinforcement Batch 75:6118,21.ISSN 2013. 27(5):1681–1688, , nentoa ora fCom- of Journal International 16:06,20.doi: 2007. 11(6):60–67, , ae 8014,2016. 1840–1848, pages , ae 56,2005. 55–62, pages , Bibliography h Semantic The IMagazine AI pages , pages , 187 . ,

12 Bibliography

[82] Romain Laroche, Mehdi Fatemi, Joshua Romoff, and Harm van Seijen. Multi-advisor rein- forcement learning. CoRR, abs/1704.00756, 2017. URL http://arxiv.org/abs/1704. 00756.

[83] Martin Lauer and Martin A. Riedmiller. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In An algorithm for distributed reinforcement learning in cooperative multi-agent systems, pages 535–542, 2000.

[84] Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. Deep learning. Nature, 521(7553): 436–444, 2015. doi: 10.1038/nature14539.

[85] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Chris- tian Bizer. Dbpedia - A large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web, 6(2):167–195, 2015.

[86] Lihong Li, Michael L. Littman, and Christopher R. Mansley. Online exploration in least-squares policy iteration. In 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, Volume 2, pages 733–739, 2009. doi: 10.1145/ 1558109.1558113.

[87] Lihong Li, Michael L. Littman, Thomas J. Walsh, and Alexander L. Strehl. Knows what it knows: a framework for self-aware learning. Machine Learning, 82(3):399–443, 2011. doi: 10.1007/s10994-010-5225-4.

[88] Nebil Ben Mabrouk, Nikolaos Georgantas, and Valérie Issarny. Set-based bi-level opti- misation for qos-aware service composition in ubiquitous environments. In IEEE Interna- tional Conference on Web Services, ICWS'15, New York, NY, USA, pages 25–32, 2015. doi: 10.1109/ICWS.2015.14.

[89] David L. Martin, Mark H. Burstein, Drew V. McDermott, Sheila A. McIlraith, Massimo Paolucci, Katia P. Sycara, Deborah L. McGuinness, Evren Sirin, and Naveen Srinivasan. Bring- ing semantics to web services with OWL-S. World Wide Web, 10(3):243–277, 2007. doi: 10.1007/s11280-007-0033-x.

[90] Francisco S. Melo and Manuela M. Veloso. Learning of coordination: exploiting sparse inter- actions in multiagent systems. In 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS'09), Budapest, Hungary, Volume 2, pages 773–780, 2009. doi: 10.1145/1558109.1558118.

[91] Pablo N. Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings the 7th International Conference on Semantic Systems, I-SEMANTICS'11, Graz, Austria, pages 1–8, 2011.

[92] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural In- formation Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems '13, Lake Tahoe, Nevada, USA, pages 3111–3119, 2013.

188 PtikPiip ai aehoa ak ai,CrsinWbr ihe öz ci Ret- Achim Götz, Michael Weber, Christian Katic, Darko Maleshkova, Maria Philipp, Patrick [102] frame- semantic A Katic. Darko and Rettinger, Achim Maleshkova, Maria Philipp, Patrick [101] In advice. expert multi-step for learning Reinforcement Rettinger. Achim and Philipp Patrick [100] JnPtr n tfnSha.Plc rdetmtosfrrbtc.In robotics. for methods gradient Policy Schaal. Stefan and Peters Jan [99] for vectors Global Glove: Manning. D. Christopher and Socher, Richard Pennington, Jeffrey [98] Glover, Kevin Ferris, Justin Alpdemir, Nedim M. Addis, Matthew Greenwood, Mark Oinn, Tom [97] Val- Burgert. Oliver and Meixensberger, Juergen Strauss, Gero Jannin, Pierre Neumuth, Thomas [96] Burgert. Oliver and Lemke, HeinzU Meixensberger, Jürgen Strauß, Gero Neumuth, Thomas [95] Lillicrap, P. Timothy Graves, Alex Mirza, Mehdi Badia, Puigdomènech Adrià Mnih, Volodymyr [94] G. Marc Veness, Joel Rusu, A. Andrei Silver, David Kavukcuoglu, Koray Mnih, Volodymyr [93] 73 06 o:10.1007/s11548-015-1322-y. of doi: pipelines 2016. cognitive Toward 1753, Studer. Rudi algorithms. and assistance Müller-Stich, medical P. Beat Dill- Kenngott, Rüdiger Wekerle, Hannes Anna-Laura mann, Nolden, Marco Kämpgen, Benedikt Speidel, Stefanie tinger, Netherlands The 10.1007/978-3-319-19890-3_25. Rotterdam, ICWE'15, In Conference, making. ternational decision sequential for work AAMAS'17, Systems, Multiagent Brazil and Paulo, Agents São Autonomous on Conference 16th the of Proceedings China Beijing, 2006, 10.1109/IROS.2006.282564. IROS doi: Systems, and 2006. Robots Intelligent on Conference of Group Interest Special a SIGDAT, of ACL meeting the A Qatar, Doha, EMNLP'14, Processing, guage In representation. word in Lessons Taverna: articles. Research Wroe. sciences: life Chris the and Exper. for Pract. Wipat, environment Anil workflow Stevens, a R. Matthew creating Robert Lord, Phillip Senger, Li, Peter Martin Marvin, Darren Pocock, Hull, Duncan Goderis, Antoon Goble, Carole models. Association process surgical Informatics for acquisition knowledge of idation In interventions. applications surgical from descriptions process of Acquisition USA NY, City, York reinforce- New deep ICML'16, In for methods Asynchronous learning. Kavukcuoglu. ment Koray and Silver, David Harley, Tim reinforcement deep through control Human-level Hassabis. learning. Demis Pe- Daan and Stig Kumaran, Dharshan Legg, Ostrovski, King, Shane Georg Helen Wierstra, Antonoglou, Fidjeland, Ioannis Sadik, Andreas Amir Riedmiller, Beattie, Charles A. tersen, Martin Graves, Alex Bellemare, ae 5214,2014. 1532–1543, pages , Nature oue48 of 4080 volume , 81)16–10 uut20.IS 1532-0626. ISSN 2006. August 18(10):1067–1100, , 1(50:2–3,21.di 10.1038/nature14236. doi: 2015. 518(7540):529–533, , ae 6–7,2017. 962–971, pages , rceig fte3n nentoa ofrneo ahn Learning, Machine on Conference International 33nd the of Proceedings 61:2–8,20.IS 1067-5027. ISSN 2009. 80, – 16(1):72 , rceig fteCneec nEprclMtosi aua Lan- Natural in Methods Empirical on Conference the of Proceedings e.Nt nCm.Sc. Comp. in Not. Lec. n.J optrAsse ailg n Surgery and Radiology Assisted Computer J. Int. ae 9813,2016. 1928–1937, pages , niern h e nteBgDt r 5hIn- 15th - Era Data Big the in Web the Engineering ae 0–1.Srne,2006. Springer, 602–611. pages , ora fteAeia Medical American the of Journal ae 9–0,21.doi: 2015. 392–409, pages , aaaeadepr systems expert and Database EERJInternational IEEE/RSJ ae 2219–2225, pages , ocr.Cmu.: Comput. Concurr. Bibliography 11(9):1743– , 189

12 Bibliography

[103] Patrick Philipp, Maria Maleshkova, Achim Rettinger, and Darko Katic. A semantic framework for sequential decision making. J. Web Eng., 16(5&6):471–504, 2017.

[104] Patrick Philipp, Achim Rettinger, and Maria Maleshkova. On automating decentralized multi- step service combination. In IEEE International Conference on Web Services, ICWS'17, Hon- olulu, HI, USA, pages 736–743, 2017. doi: 10.1109/ICWS.2017.89.

[105] A Platanios, Avrim Blum, and Tom M Mitchell. Estimating accuracy from unlabeled data. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, Quebec, Canada, pages 682–691, 2014.

[106] Emmanouil Antonios Platanios, Avinava Dubey, and Tom M. Mitchell. Estimating accuracy from unlabeled data: A bayesian approach. In Proceedings of the 33nd International Conference on Machine Learning, ICML'16, New York City, NY, USA,, pages 1416–1425, 2016.

[107] Marc J. V. Ponsen, Tom Croonenborghs, Karl Tuyls, Jan Ramon, and Kurt Driessens. Learning with whom to communicate using relational reinforcement learning. In 8th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, Volume 2, pages 1221–1222, 2009. doi: 10.1145/1558109.1558222.

[108] Frank Puppe. Systematic introduction to expert systems - knowledge representations and problem-solving methods. Springer, 1993. ISBN 978-3-540-56255-9.

[109] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, New York, USA, 1st edition, 1994. ISBN 0-471-61977-9.

[110] Jinghai Rao and Xiaomeng Su. A survey of automated web service composition meth- ods. In Semantic Web Services and Web Process Composition, First International Work- shop, SWSWPC'04, San Diego, CA, USA, Revised Selected Papers, pages 43–54, 2004. doi: 10.1007/978-3-540-30581-1_5.

[111] John R. Rice. The algorithm selection problem. Advances in Computers, 15:65–118, 1976. doi: 10.1016/S0065-2458(08)60520-3.

[112] Dumitru Roman, Uwe Keller, Holger Lausen, Jos de Bruijn, Rubén Lara, Michael Stollberg, Axel Polleres, Cristina Feier, Christoph Bussler, and Dieter Fensel. Web service modeling ontology. Applied Ontology, 1(1):77–106, 2005.

[113] Pablo Ruiz and Thierry Poibeau. Combining open source annotators for entity linking through weighted voting. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, *SEM'15, Denver, Colorado, USA, 2015.

[114] Nicolai Schoch, Patrick Philipp, Tobias Weller, Sandy Engelhardt, Mykola Volovyk, Andreas Fetzer, Marco Nolden, Raffaele De Simone, Ivo Wolf, Maria Maleshkova, Achim Rettinger, Rudi Studer, and Vincent Heuveline. Cognitive tools pipeline for assistance of mitral valve surgery. In Proceedings of SPIE, Medical Imaging'16: Image-Guided Procedures, Robotic Interventions and Modeling, pages 9786 – 9794, 2016. doi: 10.1117/12.2216059.

190 RcadS utnadAde .Barto. G. Andrew and Sutton S. knowl- Richard semantic of [126] core a Yago: Weikum. Gerhard and Kasneci, Gjergji Suchanek, M. Fabian [125] PAC Littman. L. Michael and Langford, John Wiewiora, Eric Li, Lihong Strehl, L. Alexander [124] language A Data-fu: Studer. Rudi and Harth, Andreas Speiser, Sebastian Stadtmüller, Steffen [123] apis. linked of discovery Scalable Norton. Barry and Stadtmüller Steffen [122] data linked with services and data linked Integrating Harth. Andreas and Speiser Sebastian [121] Peter Beat Azad, Pedram Sudra, Gunther Krappe, Sebastian Benzko, Julia Speidel, Stefanie [120] recognition. entity named for learning Ensemble Ngomo. Ngonga Axel-Cyrille and Speck René [119] web for planning HTN Nau. S. Dana and Hendler, A. James Wu, Dan Parsia, Bijan Sirin, Evren [118] ser- web of composition Semi-automatic Parsia. Bijan and Hendler, A. James Sirin, Evren [117] computing. cloud Agent-based Sim. Mong Kwang Trust [116] Moritz. Philipp and Jordan, I. Michael Abbeel, Pieter Levine, Sergey Schulman, John [115] optto n ahn erig I rs,19.IB 0262193981. ISBN 1998. Press, MIT learning. machine and computation In edge. 10.1145/1143844.1143955. USA Pennsylvania, Pittsburgh, In (ICML'06), Conference International learning. reinforcement model-free Brazil Janeiro, In de Rio data. '13, linked WWW read/write Conference, with Web interaction for interpreter an and (IJMSO) Ontologies 2013.056603. and Semantics Metadata, of I Part 10.1007/978-3-642-21034-1_12. Proceedings, Crete, Heraklion, ESWC'11, ference, In services. In Modeling in- sequences. and minimally Procedures, image Image-Guided of Visualization, endoscopic classification Imaging'09: on Automatic based Dillmann. instruments Rüdiger vasive and Gutt, Carsten Müller-Stich, I Part Proceedings, Italy, In 2004.06.005. SHOP2. using composition service Infrastructure and Architecture France Angers, Modeling, ICEIS'03, Services: with Web conjunction on In Workshop (WSMAI'03), 1st the of Proceedings In descriptions. semantic using vices 10.1109/TSC.2011.52. doi: 2012. 564–577, France Lille, ICML'15, Learning, In optimization. policy region h eatcWb-IW'4-1t nentoa eatcWbCneec,Rv e Garda, del Riva Conference, Web Semantic International 13th - ISWC'14 - Web Semantic The WWW h eatcWb eerhadApiain t xeddSmni e Con- Web Semantic Extended 8th - Applications and Research Web: Semantic The ae 9–0,2007. 697–706, pages , ae 1–3,21.di 10.1007/978-3-319-11964-9_33. doi: 2014. 519–534, pages , rceig fte3n nentoa ofrneo Machine on Conference International 32nd the of Proceedings ae 8919,2015. 1889–1897, pages , .WbSem. Web J. e evcs oeig rhtcueadInfrastructure, and Architecture Modeling, Services: Web ahn erig rceig fteTwenty-Third the of Proceedings Learning, Machine enocmn erig-a introduction an - learning Reinforcement ()3736 04 o:10.1016/j.websem. doi: 2004. 1(4):377–396, , ()9–0,21.di 10.1504/IJMSO. doi: 2013. 8(2):95–105, , ae 2513,2013. 1225–1236, pages , EETas evcsComputing Services Trans. IEEE ae 7–8.Srne,21.doi: 2011. Springer, 170–184. pages , ae 72,2003. 17–24, pages , 2dItrainlWrdWide World International 22nd rceig fSI,Medical SPIE, of Proceedings ae 8–8,20.doi: 2006. 881–888, pages , oue76,2009. 7261, volume , nentoa Journal International Bibliography Adaptive . 5(4): , 191

12 Bibliography

[127] Stefan Suwelack, Markus Stoll, Sebastian Schalck, Nicolai Schoch, Rüdiger Dillmann, Rolf Bendl, Vincent Heuveline, and Stefanie Speidel. The medical simulation markup language - simplifying the biomechanical modeling workflow. In Medicine Meets Virtual Reality 21 - NextMed, MMVR 2014, Manhattan Beach, California, pages 394–400, 2014. doi: 10.3233/ 978-1-61499-375-9-394.

[128] Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Auto-weka: com- bined selection and hyperparameter optimization of classification algorithms. In The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'13, Chicago, IL, USA, pages 847–855, 2013. doi: 10.1145/2487575.2487629.

[129] J. Tong, E. Haihong, M. Song, J. Song, and Y. Li. Web service QoS prediction under sparse data via local link prediction. In 10th IEEE International Conference on High Per- formance Computing and Communications & IEEE International Conference on Embedded and Ubiquitous Computing, HPCC/EUC'13, Zhangjiajie, China, pages 2285–2290, 2013. doi: 10.1109/HPCC.and.EUC.2013.328.

[130] Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both. AGDISTIS - graph-based disambiguation of named entities using linked data. In The Semantic Web - ISWC'14 - 13th International Semantic Web Conference, Riva del Garda, Italy, Proceedings, Part I, pages 457–471, 2014. doi: 10.1007/ 978-3-319-11964-9_29.

[131] Ricardo Usbeck, Michael Röder, Axel-Cyrille Ngonga Ngomo, Ciro Baron, Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke, Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann. GER- BIL: general entity annotator benchmarking framework. In Proceedings of the 24th Interna- tional Conference on World Wide Web, WWW'15, Florence, Italy, pages 1133–1143, 2015. doi: 10.1145/2736277.2741626.

[132] Leslie G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984. doi: 10.1145/1968.1972.

[133] Hado Van Hasselt and Marco A Wiering. Reinforcement learning in continuous action spaces. In (IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 272–279. IEEE, 2007.

[134] Harm van Seijen, Mehdi Fatemi, Joshua Romoff, and Romain Laroche. Improving scalability of reinforcement learning by separation of concerns. CoRR, abs/1612.05159, 2016. URL http: //arxiv.org/abs/1612.05159.

[135] Ruben Verborgh, Andreas Harth, Maria Maleshkova, Steffen Stadtmüller, Thomas Steiner, Mohsen Taheriyan, and Rik Van de Walle. Survey of Semantic Description of REST APIs, pages 69–89. Springer New York, 2014. doi: 10.1007/978-1-4614-9299-3_5.

[136] Ricardo Vilalta and Youssef Drissi. A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2):77–95, 2002. doi: 10.1023/A:1019956318069.

192 ZiHaZhou. Zhi-Hua [149] net- service by services Ranking Su. Z. and Silva-Lepe, I. Sailer, A. Perng, S. ranking C. Liu, Qos L. Zhou, Y. Wang. [148] Jianmin and Lyu, R. Michael Zhang, Yilei Wu, Xinmiao Zheng, Zibin [147] multi-agent in decision-making Collective Nagpal. Radhika and Werfel, Justin Yu, In Chih-Han systems. [146] service-based for selection and ranking service QoS-based Yin. Y. and Yau S. S. [145] OWL- Wilkinson. D. Mark and McCarthy, Luke E. Vandervalk, P. Benjamin Wood, Ian [144] generalization. Stacked Wolpert. H David [143] Schöbinger, Max Nolden, Marco Böttger, Thomas Wegner, Ingmar Vetter, Marcus Wolf, Ivo [142] reinforce- connectionist for algorithms gradient-following statistical Simple Williams. J. Ronald q-learning. note [141] Technical Dayan. Peter and Watkins H. C. J. Christopher compact Exploring [140] Littman. L. Michael and Diuk, Carlos Szita, Istvan Walsh, J. Thomas knowledgebase. [139] collaborative free a Wikidata: Krötzsch. Markus and Vrandecic Denny [138] for annotations Wsmo-lite Fensel. Dieter and Viskova, Jana Kopecký, Jacek Vitvar, Tomas [137] dto,21.IB 49307 9781439830031. 1439830037, ISBN 2012. edition, USA CA, In Clara, Santa attributes. service and structure work 10.1109/TPDS.2012.285. services. cloud for prediction 1-3 Volume 10.1145/1838186.1838192. Canada, Toronto, In (AAMAS'10), Systems Multiagent leadership. implicit by systems USA DC, 10.1109/SCC.2011.114. Washington, doi: SCC'11, 2011. Computing, 56–63, Services on Conference International IEEE Symposium, International 5th II - Part Proceedings, Studies 10.1007/978-3-642-34032-1_6. Greece, Case Crete, and Heraklion, ISoLA'12, In Applications Validation. workflows. and Verification abstract as domain-models DL S0893-6080(05)80023-1. interaction imaging medical The Meinzer. Hans-Peter toolkit. and Kunert, Tobias Hastenteufel, Mark learning. ment 10.1007/BF00992698. doi: 1992. 8:279–292, Canada QC, 2009. Montreal, 591–598, Intelligence, Artificial In in Uncertainty on regression. Conference Twenty-Fifth linear with representations reinforcement-learning ACM mun. Spain 978-3-540-68234-9_49. Islands, Canary Tenerife, ESWC'08, Conference, In services. web eia mg Analysis Image Medical 71)7–5 04 o:10.1145/2629489. doi: 2014. 57(10):78–85, , ahn Learning Machine nebemtos onain n algorithms and foundations methods: Ensemble h eatcWb eerhadApiain,5hErpa eatcWeb Semantic European 5th Applications, and Research Web: Semantic The ae 63,2013. 26–33, pages , EETas aallDsrb Syst. Distrib. Parallel Trans. IEEE ()54–64 05 SN1361-8415. ISSN 2005. 604, – 9(6):594 , :2–5,19.di 10.1007/BF00992696. doi: 1992. 8:229–256, , t nentoa ofrneo uooosAet and Agents Autonomous on Conference International 9th EE2t nentoa ofrneo e Services, Web on Conference International 20th IEEE erlnetworks Neural eeaigApiain fFra Methods, Formal of Applications Leveraging ae 7–8,20.di 10.1007/ doi: 2008. 674–689, pages , ae 66.Srne,21.doi: 2012. Springer, 56–66. pages , ()2129 92 o:10.1016/ doi: 1992. 5(2):241–259, , ae 1919,21.doi: 2010. 1189–1196, pages , 46:2312,21.doi: 2013. 24(6):1213–1222, , hpa alCC 1st Hall/CRC, & Chapman . A'9 rceig fthe of Proceedings UAI'09, ahn Learning Machine Bibliography pages , pages , Com- 193 ,

12

ito Figures of List 140 . 91 . 133 . . . . 139 ...... 132 . 129 ...... 126 ...... assistance. . . . medical . assistance. . for . medical . . architecture . for . automation . experts . . The . annotate . . to . . . 66 form 85 . . SMW . . . 9.2 The 21 ...... 20 . . . 9.1 components. . . . expert. meta . conversion . . semantic . . image . of . semantic . . . Interactions . . the . . for . . . . . rule . LD-Fu . . . expert. . 8.4 . A . conversion . . image . . . . semantic . . exemplary . . . 20 . 8.3 An components. . . automation . . . of . . overview . . . . 8.2 Schematic ...... 8.1 MEP. . . . a . coordination . within . active . . ECP . with . The . framework . . . MDP . . the . . . in . . 6.4 Advice . . . . Expert mecha- . Multi-Step arbitrary . . . . an of . with . Schema . . framework . . MDP . . . the . . in . 6.3 . . Advice . . Expert . MAS. . Multi-Step . a . of . in . Schema . Advice . . Expert a . . Multi-Step b) . 8 . of . and 6.2 . Schema . process . . framework. . formal MDP . . the 6.1 the . . in a) . . EPs . with . of . . Schema Process . . . Expert . . . an 7 . . 5.2 . in . . . 5 Advice . . Expert . . 6 . Multi-Step ...... 5.1 . . . experts. . . . TPM . . . for . . Steps . TPM. . . a . . . of . example . headscan. . 2.3 An . MRI . . a . . of . . example . . 2.2 An . . experts. . with . . tasks . 2.1 multi-step . . solving algorithms. for . using . approach tasks steps. General . multi-step solve solving to . for algorithms problem using 1.4 Automation tasks tasks. multi-step multi-step of solve problem to 1.3 Learning algorithms using for schema 1.2 General 1.1 90 ...... 87 ...... 62 ...... protocol. . . . actions. . joint choose . to nism ...... NERD. for example NLP 195

12

ito Tables of List 106 . . 106 . 104 ...... 72 ...... 72 ...... 71 ...... with . . . . estimation . accuracy . . . . . a) . for . . . . . results Evaluation . . . . . experts. . . of . . . performances . 7.4 . Baseline . . NED. & . . . . NER for . . . 7.3 Experts evaluation. . . . for kernels . . . 7.2 Used ...... 7.1 dependencies. . . meta expert . . Inter-step . dependencies. meta . expert 5.3 Intra-step dependencies. meta expert 5.2 Single-step 5.1 169 ...... 159 . . . . 159 . 154 ...... 140 ...... 127 ...... 152 ...... 110 ...... SEP...... NERD . . the . . . . for . learner. . results . . . semantic . Evaluation . grounded . . . . the . . for . . . . results experts. . 9.6 . Evaluation recognition . . . phase . . of . . performances . planning. . 9.5 Baseline semantic . . abstract . executions. for . expert . results . semantic 9.4 Evaluation . versus . local . for TPM. . results for . 9.3 Evaluation outputs . generated . and . inputs . 9.2 Needed . . . 9.1 experts. . semantic for . description Minimal . . optimization 8.1 outcome b) for with results estimation Evaluation accuracy a) for results 7.6 Evaluation 7.5 h h = = 109 ...... 1. 110 ...... 2. 197

12

LD-Fu LAPI KWIK KB JSON-LD JAX-RS JavaEE IRL IoT i.i.d. HTTP HTN HRL HLMRF GERBIL GD FOAF FMA EWH EP ECP CT CDP BOW BGP BGD AutoML API AIDT AGDT Abbreviations of List 77, 76, 75, 74, 73, , 68 67, 66, 65, , 66 65, 64, 62, 61, 59, , 56 52, 51, , 16 15, 13, 12, Process. Expert 20 19, 18, Tomography. Computed 160, 143, , 138 137, , 135 132, 129, 128, , 127 126, 63, , 58 47, 17, 4, Base. Knowledge (Structured-) 115 114, 112, 111, 109, 107, 100, 99, 93, 92, 31, Descent. Gradient 179 Things. of Internet 119 48, Interface. Programming Application 178 177, Learning. Reinforcement Inverse 63 52, 35, 33, 31, 29, 28, distributed. identically and Independent 114 101, 99, 98, 97, 96, 92, 91, 90, 89, 85, 84, Process. Coordination Expert 61 51 , Process. Decision Contextual 156 148, 138, 135, 130, 128, 124, 123, 122, 121, 55, 48, 47, 46 , Pattern. Graph Basic 56 Network. Task Hierarchical 53 Learning. Reinforcement Hierarchical 31 Descent. Gradient Batch 138 Anatomy. of Model Foundational 135, 134 , 130, 129, 127, 121, 120 , 119, 55, 48, 13, Interface. Programming Application Linked 68 Bag-of-Words. 109, 107, , 101 94, 93, 84, 82, , 81 77, , 76 75, 74 , , 64 63, , 37 36, Hedge. / Weights Exponential 114 106 , Tagger. AIDA 46 45, Friend-of-a-Friend. 176 138, 131, 130, 127, 119, 48, 47, 11, , 4 Protocol. Transfer Hypertext 114 106 , Tagger. AGDISTIS 195 179, 176, 165, 164, 161, 158, 152, 146, 135 , 131, 130, 120, Data-Fu. Linked 177 Knows-What-It-Knows. 151 Edition. Enterprise Java 151 Services. Web RESTful for API Java 52 Learning. Machine Automatic 81 78, Field. Random Markov Hinge-Loss 176 169, 162, 151, 141, 137, 164 162, 117, , 116 115, 114, 195 112, 115 177, 111, 114, 176, 108, 113, 175, 107, 111, 168, 106, 145, , 104 137, 103, 135, 89, 125, 88, 124, 87, 123, 86, 122, 85, 120, , 84 83, 82, 81, 78, 106 105, 17, Framework. Benchmark Annotation Entity General 57 Data. Linked for Notation Object JavaScript 199

12 List of Abbreviations

LGM Lifted Graphical Model. 32, 175 LOMDP Logical Markov Decision Process. 56, 179 LSH Locality Sensitive Hashing. 73

MAS Multiagent System. 11, 12, 14, 15, 42, 44, 51, 53, 54, 83, 84, 85 MDP Markov Decision Process.6, 12, 13, 26, 38, 39, 40, 41, 42, 43, 44, 54, 55, 56, 57, 61, 64, 65, 66, 68, 74, 81, 84, 86, 89, 144, 145, 146, 148, 152, 155, 165, 169, 170, 175, 176, 178, 179, 195 MEP Multiagent Expert Process. 12, 15, 16, 53, 54, 59, 83, 84, 85, 86, 87, 88, 89, 90,91, 92, 93, 94, 96, 97, 99, 100, 101, 102, 103, 104, 106, 107, 108, 112, 115, 116, 168, 175, 176, 178 MeSH Medical Subject Headings. 138 MHA MetaImage. 19, 139 MITK Medical Imaging Interaction Toolkit. 19, 119, 139, 151 ML Machine Learning. 153, 156, 157, 158, 159 MLN Markov Logic Network. 80 MMDP Multiagent Markov Decision Process. 42, 43, 44, 54, 84, 86, 90 MPE Most Probable Explanation. 80 MRG Multi-Relational Graph. 32 MRI Magnetic Resonance Imaging. 18, 19, 20, 195 MSM Minimal Service Model. 55, 128 MSML Medical Simulation Markup Language. 119

N3 Notation3. 46, 130 NED Named Entity Disambiguation.4,7,9, 13, 17, 18, 63, 84, 103, 105, 106, 108, 113, 115, 121, 160, 161, 162, 163, 164, 165, 170 NER Named Entity Recognition.4,5, 13, 17, 18, 62, 63, 68, 84, 103, 105, 106, 108, 113, 115, 121, 160, 161, 162, 163, 165, 170 NERD Named Entity Recognition- and Disambiguation. 13, 15, 16, 17, 18, 25, 59, 62, 63, 64, 84, 86, 103, 105, 108, 115, 116, 121, 123, 124, 134, 135, 137, 156, 160, 165, 168, 170, 171, 176, 195 NIF Natural Language Processing Interchange Format. 160 NLP Natural Language Processing.3,4, 13, 17, 25, 52, 56, 57, 62, 68, 115, 117, 120, 121, 122, 124, 137, 160, 162, 169, 170, 195 NN Nearest Neighbors. 73 NRRD Nearly Raw Raster Data. 19, 139, 152

OPM Open Provenance Model. 55 OPMW Open Provenance Model for Workflows. 55, 57, 179 OWL Web Ontology Language. 45, 55, 56, 57

PAC Probably Approximately Correct. 177 PACS Picture Archiving and Communication System. 19 PGM Probabilistic Graphical Model. 32, 42 PNG Portable Network Graphics. 128, 139 POS Part-of-Speech. 104 PSL Probabilistic Soft Logic. 63, 64, 77, 78, 80, 81, 82, 107, 113, 115 PSM Problem Solving Methods. 55, 56

QoS Quality-of-Service. 53

RadLex Radiology Lexicon. 138 RDF Resource Description Framework. 11, 44, 45, 46, 47, 48, 56, 57, 121, 122, 123, 124, 125, 126, 128, 135, 138, 145, 146, 151, 160, 176

200 XML WWW WSMO WSDL URI UPML TURTLE TRPO TPM SWRL STRIPS ST SRL SQL SPT SPS SPARQL SoC SMW SLA SGD SEP SAWSDL SAS RRL RMDP RL REST RDFS 114 106, Tagger. Stanford 65, 64, 63, 61, 59, 57, 54, 53, 52, 51, 44, 41, , 40 15, 13, 12, 11, 10, 6 , Learning. Reinforcement 106 Spotter. Spotlight 84 83, 53, Concerns. of Separation 131 126, 47, 45, 44, Identifier. Resource Uniform 106 Tagger. Spotlight 114 93, 83, 51, System. Single-Agent 127 , 126, 125 , 124, , 123 122 , 121, 120 , 119, 117, 57, 56, 16 , 15, 13, Process. Expert Semantic 115 113, 109, 77, 32, Learning. Relational Statistical 56 Agreement. Level Service 46 Systems. Database Relational for Language Query Structured 92 31, Descent. Gradient Stochastic 54 52, Learning. Reinforcement Relational 146 , 145, 144 , 143, 141, 139 , 138, 137 , 133, 124, 121 , 20, 19, 17 , Mapping. Progression Tumor 56 Language. Markup Extensible 138 MediaWiki. Semantic 45 Schema. Framework Description Resource 143 48, Transfer. State Representational 177 Optimization. Policy Region Trust 56 Language. Method Solving Problem Unified 55 Language. Description Service Web 159 156, Language. Rule Web Semantic 56 52, Process. Decision Markov Relational 44 Web. Wide World 55 Ontology. Modeling Service Web 56 55, Solver. Problem Institute Research Stanford 216 195, 170, 162, 160, 159, 156, 155, 152, 151, 149, 148, 179 176, 170, 168, 165, 159, 158, 148, 145, 144, 137, 135, 134, 133, 132, 131, 129, 178 177, 175, 125, 124, 116, 115, 114, 107, 101, 97, 96, 93, 84, 83, 82, 81, 77, 74, 73, 68, 57 48, 47, 46 , Language. Query Framework Description Resource and Protocol SPARQL 46 , 45 Language. Triple Framework Description Resource Terse 55 Language. Description Service Web for Annotations Semantic ito Abbreviations of List 201

12

17 16 15 14 13 12 11 10 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 epeettefl eatcdsrpin o l eatceprso h P ak obetter To task. prefixes. TPM used the all of gather experts first semantic we all descriptions, for the descriptions present semantic full the present We Experts Semantic A.1 SEPs. for infrastructure for Web-based programs the of linked execution (ii) the and handling experts regards is semantic with which to materials agent experts additional semantic lift provide the which now we description thesis, semantic the (i) in to present contents the on Based Semantics Appendix: A Appendix .. ri akGeneration Mask Brain A.1.1 exp:BMG xsd: @prefix exp: @prefix sparql: < http://aifb-ls3vm2.kitedu:8080/experts/sepkbid> @prefix sepkb: @prefix sepnlp: @prefix sep: @prefix nif: < http://surgipedia.sfb125de/images> @prefix spimg: @prefix spp: @prefix spc: @prefix sp: @prefix sawsdl: @prefix rdfs: @prefix rdf: @prefix owl: @prefix msm: @prefix dc: @prefix p:rao sp:Christian_Weber, Expert"; Generation Mask Brain spc:BrainMaskGeneration; "Semantic sep:SemanticExpert, sp:Patrick_Philipp, spp:creator spp:contributor rdfs:label rdf:type Pexsfrsmni ecitosfrsmni experts. semantic for descriptions semantic for Prefixes A.1: Listing sp:Michael_Goetz; sp:Philipp_Gemmeke; 203

A Appendix A Appendix: Semantics

8 spp:exampleRequest spimg:BMG_example_request.ttl; 9 spp:exampleResponse spimg:BMG_example_response.ttl; 10 spp:sourceCode 11 ; 12 spp:description 13 "Segmentation of brain in a head scan."@en; 14 owl:sameAs sp:BMG_Description; 15 sawsdl:modelReference 16 [ a msm:Postcondition; 17 rdf:value """{ 18 19 ?headscan rdf:type spc:Headscan; 20 dc:format ?format. 21 FILTER (?format = spc:NRRD || ?format = spc:MHA) 22 ?brainImage rdf:type spc:BrainImage; 23 dc:format spc:NRRD; 24 spp:headscan ?headscan. 25 ?brainMask rdf:type spc:BrainMask; 26 dc:format spc:NRRD; 27 spp:headscan ?headscan. 28 29 }"""^^sparql:GraphPattern], 30 [ a msm:Precondition; 31 rdf:value """{ 32 33 ?headscan rdf:type spc:Headscan; 34 dc:format ?format. 35 FILTER (?format = spc:NRRD || ?format = spc:MHA) 36 ?brainAtlasImg rdf:type spc:BrainAtlasImage; 37 dc:format spc:NRRD. 38 ?brainAtlasMask rdf:type spc:BrainAtlasMask; 39 dc:format spc:NRRD. 40 41 }"""^^sparql:GraphPattern]. Listing A.2: Semantic description of semantic brain mask generation expert. A.1.2 Standard Normalization

1 exp:BN 2 rdf:type sep:SemanticExpert, spc:BrainNormalization; 3 rdfs:label "Semantic Brain Mask Normalization Expert"; 4 spp:contributor sp:Patrick_Philipp, 5 sp:Philipp_Gemmeke; 6 spp:creator sp:Christian_Weber, 7 sp:Michael_Goetz; 8 spp:exampleRequest spimg:BN_example_request.ttl; 9 spp:exampleResponse spimg:BN_example_response.ttl; 10 spp:sourceCode 11 ; 12 spp:description 13 "Normalization of brain mask in a head scan."@en; 14 owl:sameAs sp:BN_Description; 15 sawsdl:modelReference 16 [ a msm:Postcondition;

204 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 9 8 7 6 5 4 3 2 1 .. outNormalization Robust A.1.3 exp:RBN w:aessp:RBN_Description; sawsdl:modelReference owl:sameAs spp:description spimg:RBN_example_response.ttl; spp:sourceCode spp:exampleResponse p:rao sp:Christian_Weber, spimg:RBN_example_request.ttl; spc:RobustBrainNormalization; sep:SemanticExpert, sp:Patrick_Philipp, spp:exampleRequest spp:creator spp:contributor rdfs:label rdf:type Smni ecito fsmni ri omlzto expert. normalization brain semantic of description Semantic A.3: Listing msm:Postcondition; scan."@en; a head [ a in mask brain of Normalization "Robust ; Expert"; Normalization Mask Brain Robust "Semantic msm:Precondition; a [ d:au """{ rdf:value """{ rdf:value """{ rdf:value me d:yespc:ManualSegmentation; rdf:type spc:BrainMask; spc:Headscan; ?mSeg rdf:type spc:MHA) = ?format || spc:NRRD rdf:type ?brainMask = (?format FILTER ?headscan spc:BrainMask; spc:Headscan; rdf:type spc:MHA) = ?format || spc:NRRD rdf:type ?brainMask = (?format FILTER ?headscan spc:NormalizedBrainMask; spc:BrainMask; rdf:type spc:Headscan. ?normMask rdf:type rdf:type ?brainMask ?headscan }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern], cfra spc:NRRD; spc:NRRD; dc:format ?headscan. spp:headscan ?format. dc:format dc:format spc:NRRD; ?headscan. spp:headscan ?format. dc:format dc:format ?brainMask. spc:NRRD; ?headscan; spp:brainMask spp:headscan spc:NRRD; dc:format ?headscan. spp:headscan dc:format sp:Michael_Goetz; sp:Philipp_Gemmeke; . eatcExperts Semantic A.1 205

A Appendix A Appendix: Semantics

28 spp:headscan ?headscan. 29 ?normMask rdf:type spc:RobustNormalizedBrainMask; 30 dc:format spc:NRRD; 31 spp:headscan ?headscan; 32 spp:brainMask ?brainMask; 33 spp:manualSegmentation 34 ?mSeg. 35 36 }"""^^sparql:GraphPattern], 37 [ a msm:Precondition; 38 rdf:value """{ 39 40 ?headscan rdf:type spc:Headscan; 41 dc:format ?format. 42 FILTER (?format = spc:NRRD || ?format = spc:MHA) 43 ?brainMask rdf:type spc:BrainMask; 44 dc:format spc:NRRD; 45 spp:headscan ?headscan. 46 ?mSeg rdf:type spc:ManualSegmentation; 47 dc:format spc:NRRD; 48 spp:headscan ?headscan. 49 50 }"""^^sparql:GraphPattern]. Listing A.4: Semantic description of semantic robust brain normalization expert. A.1.4 Registration

1 exp:BR 2 rdf:type sep:SemanticExpert, spc:BrainRegistration; 3 rdfs:label "Semantic Brain Mark Registration Expert"; 4 spp:contributor sp:Patrick_Philipp, 5 sp:Philipp_Gemmeke; 6 spp:creator sp:Christian_Weber, 7 sp:Michael_Goetz; 8 spp:exampleRequest spimg:TS_example_request.ttl; 9 spp:exampleResponse spimg:TS_example_response.ttl; 10 spp:description 11 "Registration of normalized brain masks in head scans."@en; 12 owl:sameAs sp:BR_Description; 13 sawsdl:modelReference 14 [ a msm:Postcondition; 15 rdf:value """{ 16 17 ?norm_bag rdf:type rdf:Bag. 18 ?reg_bag rdf:type rdf:Bag. 19 ?headscan rdf:type spc:Headscan; 20 dc:format ?format. 21 FILTER (?format = spc:NRRD || ?format = spc:MHA) 22 ?brainMask rdf:type spc:BrainMask; 23 dc:format spc:NRRD; 24 spp:headscan ?headscan. 25 ?normMask rdf:type ?norm; 26 rdfs:member ?norm_bag; 27 dc:format spc:NRRD;

206 20 19 18 17 16 15 14 13 12 11 10 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 9 8 7 6 5 4 3 2 1 .. uo Segmentation Tumor A.1.5 exp:TS w:aessp:TS_Description; sawsdl:modelReference owl:sameAs spp:description spp:sourceCode p:rao sp:Christian_Weber, spimg:TS_example_response.ttl; Expert"; spimg:TS_example_request.ttl; Segmentation Tumor "Semantic sep:SemanticExpert; spp:exampleResponse sp:Patrick_Philipp, spp:exampleRequest spp:creator spp:contributor rdfs:label rdf:type Smni ecito fsmni ri eitainexpert. registration brain semantic of description Semantic A.5: Listing msm:Postcondition; scans."@en; a head [ in masks brain registered of segmentation "Tumor ; msm:Precondition; a [ d:au """{ rdf:value """{ rdf:value hasa d:yespc:Headscan; rdf:type ?headscan ?norm; || spc:NormalizedBrainMask = (?norm FILTER spc:BrainMask; rdf:type spc:Headscan; rdf:Bag; ?normMask rdf:type spc:MHA) = ?format || spc:NRRD rdf:type ?brainMask = (?format rdf:type FILTER ?headscan ?norm_bag spc:RegisteredBrainMask; rdf:type ?regMask || spc:NormalizedBrainMask = (?norm FILTER nr spc:RobustNormalizedBrainMask). = ?norm spc:RobustNormalizedBrainMask). = ?norm }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern], cfra ?format. dc:format ?brainMask. spc:NRRD; ?headscan; spp:brainMask ?norm_bag; spp:headscan dc:format spc:NRRD; rdfs:member ?headscan. spp:headscan ?format. dc:format dc:format spc:NRRD; ?reg_bag; spp:normalizedMask dc:format rdfs:member ?brainMask. ?headscan; spp:brainMask spp:headscan sp:Michael_Goetz; sp:Philipp_Gemmeke; ?normMask. . eatcExperts Semantic A.1 207

A Appendix A Appendix: Semantics

21 FILTER (?format = spc:NRRD || ?format = spc:MHA) 22 ?normMask rdf:type ?norm; 23 dc:format spc:NRRD; 24 spp:headscan ?headscan; 25 FILTER (?norm = spc:NormalizedBrainMask || 26 ?norm = spc:RobustNormalizedBrainMask). 27 ?regMask rdf:type spc:RegisteredBrainMask; 28 dc:format spc:NRRD; 29 spp:normalizedMask 30 ?brainMask. 31 ?tumor rdf:type spc:SegmentedTumorMask; 32 dc:format spc:NRRD. 33 spp:registeredMask 34 ?regMask. 35 36 }"""^^sparql:GraphPattern], 37 [ a msm:Precondition; 38 rdf:value """{ 39 40 ?headscan rdf:type spc:Headscan; 41 dc:format ?format. 42 FILTER (?format = spc:NRRD || ?format = spc:MHA) 43 ?normMask rdf:type ?norm; 44 dc:format spc:NRRD; 45 spp:headscan ?headscan; 46 FILTER (?norm = spc:NormalizedBrainMask || 47 ?norm = spc:RobustNormalizedBrainMask). 48 ?regMask rdf:type spc:RegisteredBrainMask; 49 dc:format spc:NRRD; 50 spp:normalizedMask 51 ?brainMask. 52 53 }"""^^sparql:GraphPattern]. Listing A.6: Semantic description of semantic tumor segmentation expert. A.1.6 Map Generation

1 exp:MG 2 rdf:type sep:SemanticExpert; 3 rdfs:label "Semantic Map Generatio Expert"; 4 spp:contributor sp:Patrick_Philipp, 5 sp:Philipp_Gemmeke; 6 spp:creator sp:Christian_Weber, 7 sp:Michael_Goetz; 8 spp:exampleRequest spimg:TPM_example_request.ttl; 9 spp:exampleResponse spimg:TPM_example_response.ttl; 10 spp:description 11 "Generates a the final tumour progression map of registered brain 12 masks in head scans."@en; 13 owl:sameAs sp:TPM_Description; 14 sawsdl:modelReference 15 [ a msm:Postcondition; 16 rdf:value """{ 17

208 10 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 9 8 7 6 5 4 3 2 1 .. ugclPaeRecognition Phase Surgical A.1.7 exp:MLPR spp:description dslbl"eatcM-ae hs eonto Expert"; sp:Darko_Katic; Recognition Phase ML-based "Semantic sep:SemanticExpert; spimg:MLPR_example_response.ttl; spimg:MLPR_example_request.ttl; sp:Patrick_Philipp; spp:sourceCode spp:exampleResponse spp:exampleRequest spp:creator spp:contributor rdfs:label rdf:type Smni ecito fsmni a eeainexpert. generation map semantic of description Semantic A.7: Listing ; msm:Precondition; a [ d:au """{ rdf:value rgakrftp spc:RegisteredBrainMask; rdf:type ?norm; ?regMask || spc:NormalizedBrainMask = (?norm spc:Headscan; FILTER rdf:type rdf:Bag. spc:MHA) = ?format || spc:NRRD rdf:type ?normMask = (?format rdf:type FILTER ?headscan ?norm_bag spc:RegisteredBrainMask; rdf:type ?norm; ?regMask || spc:NormalizedBrainMask spc:Headscan; = (?norm rdf:type FILTER rdf:Bag; spc:MHA) = rdf:Bag. spc:TumorProgressionMap. ?format || spc:NRRD rdf:type ?normMask = (?format FILTER rdf:type rdf:type rdf:type ?headscan ?tpm_bag ?reg_bag ?tpm nr spc:RobustNormalizedBrainMask). = ?norm spc:RobustNormalizedBrainMask). = ?norm }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern], cfra spc:NRRD; ?reg_bag; spp:normalizedMask dc:format rdfs:member spc:NRRD; ?headscan. ?norm_bag; spp:headscan dc:format ?format. rdfs:member dc:format spc:NRRD; ?reg_bag; spp:normalizedMask dc:format rdfs:member spc:NRRD; ?headscan. spp:headscan ?format. dc:format ?normMask. dc:format rdf:_1 ?tpm_bag. spp:mapContent ?brainMask. ?brainMask. . eatcExperts Semantic A.1 209

A Appendix A Appendix: Semantics

11 "Surgical phase recognition based on a ML-based algorithm."; 12 owl:sameAs sp:MLPR_Description; 13 sawsdl:modelReference 14 [ a msm:Postcondition; 15 rdf:value """{ 16 17 ?phase rdf:type spc:Phase; 18 spp:value ?value. 19 ?event rdf:type spc:SurgicalEvent; 20 spp:instrument ?instrument; 21 spp:action ?action; 22 spp:structure ?structure. 23 spp:phase ?phase. 24 25 }"""^^sparql:GraphPattern], 26 [ a msm:Precondition; 27 rdf:value """{ 28 29 ?col rdf:type rdf:Bag; 30 rdf:li ?trainingSample. 31 ?trainingSample rdf:type spc:Surgery. 32 ?ontology rdf:type spc:Ontology. 33 ?event rdf:type spc:SurgicalEvent; 34 spp:instrument ?instrument; 35 spp:action ?action; 36 spp:structure ?structure. 37 ?instrument rdf:type spc:Instrument. 38 ?action rdf:type spc:Action. 39 ?structure rdf:type spc:TreatedStructure. 40 41 }"""^^sparql:GraphPattern]. Listing A.8: Semantic description of ML-based phase recognition expert.

1 exp:RPR 2 rdf:type sep:SemanticExpert; 3 rdfs:label "Semantic rule-based Phase Recognition Expert"; 4 spp:contributor sp:Patrick_Philipp; 5 spp:creator sp:Darko_Katic; 6 spp:exampleRequest spimg:RPR_example_request.ttl; 7 spp:exampleResponse spimg:RPR_example_response.ttl; 8 spp:sourceCode 9 ; 10 spp:description 11 "Surgical phase recognition based on a rule-based algorithm."; 12 owl:sameAs sp:RPR_Description; 13 sawsdl:modelReference 14 [ a msm:Postcondition; 15 rdf:value """{ 16 17 ?phase rdf:type spc:Phase; 18 spp:value ?value. 19 ?event rdf:type spc:SurgicalEvent; 20 spp:instrument ?instrument; 21 spp:action ?action;

210 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 9 8 7 6 5 4 3 2 1 oeta aaee ofiuain aebe eoe rmpe n otodtosfrbetter for postconditions and pre- from removed presentation. been have configurations parameter that Note Recognition Entity Named A.1.8 exp:GNER w:aessepkb:GNER_Description; sawsdl:modelReference owl:sameAs sepnlp:description dslbl"eatcNmdEtt eonto Expert"; Recognition Entity Named sepkb:Generic; "Semantic sep:SemanticExpert; sepkb:GNER_example_response.ttl; sepnlp:sourceCode sepkb:GNER_example_request.ttl; sepkb:Patrick_Philipp; sepnlp:exampleResponse sepnlp:exampleRequest sepnlp:creator sepnlp:contributor rdfs:label rdf:type Smni ecito frl-ae hs eonto expert. recognition phase rule-based of description Semantic A.9: Listing msm:Postcondition; a [ text."; raw for algorithm recognition entity "Named ; msm:Precondition; a [ d:au """{ rdf:value """{ rdf:value antto2rftp sepnlp:Annotation; sepnlp:Token rdf:type rdf:Bag; rdf:type sepnlp:Annotation; ?annotation2 ?annotation2. ?token1 rdf:type rdf:type sepnlp:disjunct ?annotation1 ?annotation1 ?col spc:Action. spc:TreatedStructure. spc:Instrument. rdf:type rdf:type spc:SurgicalEvent; rdf:type spc:Ontology. ?structure rdf:type ?action ?instrument rdf:type ?event ?ontology }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern], i:stig?text2; ?end1; nif:isString ?start1; ?mention1; ?text. nif:referenceContext nif:endIndex ?token1; nif:beginindex "true"^^xsd:boolean. ?annotation2. ?text1; ?annotation1, nif:anchorOf sepnlp:isEntity sepnlp:token nif:isString rdf:li ?action; ?structure. spp:structure spp:action ?instrument; spp:instrument ?phase. ?structure. spp:phase spp:structure . eatcExperts Semantic A.1 211

A Appendix A Appendix: Semantics

31 sepnlp:token ?token2; 32 sepnlp:isEntity "true"^^xsd:boolean. 33 ?token2 rdf:type sepnlp:Token 34 nif:anchorOf ?mention2; 35 nif:beginindex ?start2; 36 nif:endIndex ?end2; 37 nif:referenceContext ?text. 38 ?text rdf:type ?type. 39 FILTER (?type = nif:Sentence || ?textType = nif:Paragraph). 40 41 }"""^^sparql:GraphPattern], 42 [ a msm:Precondition; 43 rdf:value """{ 44 45 ?text rdf:type ?type. 46 FILTER (?type = nif:Sentence || ?textType = nif:Paragraph). 47 48 }"""^^sparql:GraphPattern]. Listing A.10: Generic semantic description of NER expert.

A.1.9 Named Entity Disambiguation Note that parameter configurations have been removed from pre- and postconditions for better presentation. 1 exp:GNED 2 rdf:type sep:SemanticExpert; 3 rdfs:label "Semantic Named Entity Disambiguation Expert"; 4 sepnlp:contributor sepkb:Patrick_Philipp; 5 sepnlp:creator sepkb:Generic; 6 sepnlp:exampleRequest sepkb:GNED_example_request.ttl; 7 sepnlp:exampleResponse sepkb:GNED_example_response.ttl; 8 sepnlp:sourceCode 9 ; 10 sepnlp:description 11 "Named entity disambiguation algorithm for raw text."; 12 owl:sameAs sepkb:GNED_Description; 13 sawsdl:modelReference 14 [ a msm:Postcondition; 15 rdf:value """{ 16 17 ?col rdf:type rdf:Bag; 18 rdf:li ?annotation1. 19 20 ?annotation1 rdf:type sepnlp:Annotation; 21 nif:isString ?text; 22 sepnlp:token ?token1. 23 sepnlp:isEntity "true"^^xsd:boolean; 24 sepnlp:resource ?resource. 25 26 ?token1 rdf:type sepnlp:Token 27 nif:anchorOf ?mention; 28 nif:beginindex ?start;

212 11 10 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 9 8 7 6 5 4 3 2 1 efis ieafl xml nhwt xct eatceprsbsdo h tpo brain of step respective and the task on instantiated planners. based an grounded for and experts condition abstract semantic a for execute present outputs exemplary to then We how TPMs. on in example generation mask full a give first We Agent Semantic A.2 . xnat: http://aifb <-ls3vm2.aifbkitedu:8080/xnatwrapper/xnat#> @prefix vocxnat : @prefix qrl : @prefix sparql : @prefix sawsdl : http://www.w3org/2011/-methods#> < < http://www.w3org/2002/07/owl#> @prefix http: @prefix httpm : < http://www.w3org/1999/02/22-rdf-syntaxns#> @prefix owl: @prefix rdfs : @prefix rdf: @prefix @prefix: msm:Precondition; a [ d:au """{ rdf:value tx d:ye?type. sepnlp:Token rdf:type nif:Paragraph). = ?textType || nif:Sentence = (?type FILTER rdf:type sepnlp:Annotation; ?text sepnlp:Token ?token2 rdf:type rdf:Bag; rdf:type sepnlp:Annotation; ?annotation2 ?annotation2. ?token1 rdf:type rdf:type sepnlp:disjunct ?annotation1 ?annotation1 ?col Gnrcsmni ecito fNDexpert. NED of description semantic Generic A.11: Listing }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern], i:nIdx?end2; ?start2; ?mention2; ?text. nif:referenceContext nif:endIndex ?token2; nif:beginindex "true"^^xsd:boolean. ?text2; nif:anchorOf sepnlp:isEntity ?end1; sepnlp:token nif:isString ?start1; ?mention1; ?text. nif:referenceContext nif:endIndex ?token1; nif:beginindex "true"^^xsd:boolean. ?annotation2. ?text1; ?annotation1, nif:anchorOf sepnlp:isEntity sepnlp:token nif:isString rdf:li ?end; ?text. nif:referenceContext nif:endIndex . eatcAgent Semantic A.2 213

A Appendix A Appendix: Semantics

12 @prefix sp:. 13 @prefix spc:. 14 @prefix spp:. 15 @prefix sep:. 16 @prefix exp:. 17 @prefix dc:. Listing A.12: Prefixes for linked programs.

A.2.1 Semantic Expert Execution

1 { 2 :hbag rdf:type rdf:Bag. 3 :headscan1 rdf:type spc:Headscan; 4 dc:format spc:NRRD. 5 :headscan2 rdf:type spc:Headscan; 6 dc:format spc:NRRD. 7 :brainMask1 rdf:type spc:BrainMask; 8 dc:format spc:NRRD; 9 spp:headscan xnat:headscan1. 10 :brainMask2 rdf:type spc:BrainMask; 11 dc:format spc:NRRD; 12 spp:headscan xnat:headscan2. 13 :normMask1 rdf:type spc:NormalizedBrainMask; 14 rdfs:member xnat:hbag; 15 dc:format spc:NRRD; 16 spp:headscan xnat:headscan1; 17 spp:brainMask xnat:brainMask1. 18 :normMask2 rdf:type spc:NormalizedBrainMask; 19 rdfs:member xnat:hbag; 20 dc:format spc:NRRD; 21 spp:headscan xnat:headscan2; 22 spp:brainMask xnat:brainMask2. 23 } => { 24 _:a http:mthd httpm:POST; 25 http:requestURI exp:BMG; 26 http:body 27 { 28 :hbag rdf:type rdf:Bag. 29 :headscan1 rdf:type spc:Headscan; 30 dc:format spc:NRRD. 31 :headscan2 rdf:type spc:Headscan; 32 dc:format spc:NRRD. 33 :brainMask1 rdf:type spc:BrainMask; 34 dc:format spc:NRRD; 35 spp:headscan xnat:headscan1. 36 :brainMask2 rdf:type spc:BrainMask; 37 dc:format spc:NRRD; 38 spp:headscan xnat:headscan2. 39 :normMask1 rdf:type spc:NormalizedBrainMask; 40 rdfs:member xnat:hbag; 41 dc:format spc:NRRD; 42 spp:headscan xnat:headscan1;

214 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 50 49 48 47 46 45 44 43 9 8 7 6 5 4 3 2 1 .. eatcPlanning Semantic A.2.2 pro- two for expert registration brain semantic executing for program Linked A.13: Listing ntsat d:yespGoneSae sep:StartState. sep:GroundedState, spc:MLBasedPhaseRecognition; sep:SemanticExpert, rdf:type spc:RuleBasedPhaseRecognition; sep:SemanticExpert, xnat:start1 spc:MapGeneration; rdf:type sep:SemanticExpert, rdf:type spc:TumorSegmentation; sep:SemanticExpert, exp:MLPR rdf:type spc:BrainRegistration; sep:SemanticExpert, exp:RPR rdf:type spc:RobustBrainNormalization; exp:MG sep:SemanticExpert, rdf:type spc:BrainNormalization; sep:SemanticExpert, exp:TS rdf:type spc:BrainMaskGeneration; exp:BR sep:SemanticExpert, rdf:Bag. rdf:type exp:RBN rdf:type rdf:type exp:BN exp:BMG :ebag }. nrMs2rftp spc:NormalizedBrainMask; }. rdf:type :normMask2 sep:GoundedCondition; a [ d:au """{ rdf:value esdptetheadscans. patient cessed ntsg d:yespc:ManualSegmentation; spc:ManualSegmentation; rdf:type rdf:type xnat:seg2 spc:BrainAtlasMask; spc:Patient. xnat:seg1 spc:BrainAtlasImage; rdf:type xnat:brainAtlasMsk rdf:type rdf:type xnat:brainAtlasImg spc:Headscan; xnat:patient1 rdf:type spc:Headscan; xnat:headscan2 rdf:type xnat:headscan1 dsmme _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member _:ebag. rdfs:member sawsdl:modelReference p:riMs xnat:brainMask2. spc:NRRD; xnat:headscan2; spp:brainMask xnat:hbag; spp:headscan dc:format rdfs:member xnat:brainMask1. spp:brainMask p:ain xnat:patient1. spc:NRRD; spp:patient spc:NRRD. dc:format spc:NRRD. dc:format dc:format xnat:patient1. spp:patient spc:NRRD; spp:manualSegmentation dc:format xnat:patient1. spp:patient spc:NRRD; spp:manualSegmentation dc:format xnat:seg2. xnat:seg1. . eatcAgent Semantic A.2 215

A Appendix A Appendix: Semantics

42 dc:format spc:NRRD; 43 spp:patient xnat:patient2. 44 }"""^^sparql:GraphPattern]. 45 46 xnat:goal1 rdf:type sep:AbstractState, sep:GoalState; 47 sawsdl:modelReference 48 [ a sep:AbstractPostcondition; 49 rdf:value """{ 50 ?tpm1 rdf:type spc:TumorProgressionMap; 51 spp:mapContent ?tpm_seq. 52 ?reg_bag rdf:type rdf:Bag. 53 ?tpm_seq rdf:type rdf:Seq; 54 rdf:_1 ?reg_mask1; 55 rdf:_2 ?reg_mask2. 56 ?reg_mask1 rdf:type spc:RegisteredBrainMask; 57 dc:format spc:NRRD; 58 spp:headscan xnat:headscan1; 59 rdfs:member ?reg_bag; 60 spp:normalizedMask 61 ?brainMask1. 62 ?reg_mask2 rdf:type spc:RegisteredBrainMask; 63 dc:format spc:NRRD; 64 spp:headscan xnat:headscan2; 65 rdfs:member ?reg_bag; 66 spp:normalizedMask 67 ?brainMask2. 68 ?normMask1 rdf:type ?norm; 69 dc:format spc:NRRD; 70 spp:headscan xnat:headscan1. 71 ?normMask2 rdf:type ?norm; 72 dc:format spc:NRRD; 73 spp:headscan xnat:headscan2. 74 FILTER (?norm = spc:NormalizedBrainMask || 75 ?norm = spc:RobustNormalizedBrainMask). 76 }"""^^sparql:GraphPattern]. 77 OPTIONAL { 78 ?normMask1 spp:manualSegmentation 79 xnat:seg1. 80 ?normMask2 spp:manualSegmentation 81 xnat:seg2. 82 } Listing A.14: Grounded precondition for executing meta components for a TPM example.

Based on the condition, we now illustrate the generated outputs of the respective semantic meta components.

1 _:pbag rdf:type rdf:Bag, sep:PolicyBag. 2 3 _:pol1 rdf:type rdf:Seq; 4 rdfs:member _:pbag; 5 rdf:_1 exp:BMG; 6 rdf:_2 exp:BN; 7 rdf:_3 exp:BR; 8 rdf:_4 exp:TS;

216 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 9 Eapeotu ftesmni btatpanrfrteeapecondition. example the for planner abstract semantic the of output Example A.15: Listing xnat:grounding2 spc:BrainMaskGeneration; sep:SemanticExpert, rdf:Bag. rdf:type rdf:type xnat:grounding1 exp:BMG sep:GroundedState. sep:StartState, sep:AbstractState. _:ebag sep:State, sep:GoalState, sep:State, spc:MapGeneration. sep:SemanticExpert, spc:TumorSegmentation. sep:SemanticExpert, spc:BrainRegistration. rdf:type sep:SemanticExpert, rdf:type rdf:type spc:RobustBrainNormalization. xnat:start1 sep:SemanticExpert, spc:BrainNormalization. rdf:type sep:SemanticExpert, xnat:goal1 rdf:type spc:BrainMaskGeneration. sep:SemanticExpert, exp:MG rdf:type rdf:type exp:TS exp:BR rdf:type exp:RBN exp:BN exp:BMG rdf:Seq; rdf:type _:pol2 sep:GoundedCondition; a [ sep:GoundedCondition; a [ d:au """{ rdf:value """{ rdf:value ntbantaMkrftp spc:BrainAtlasMask; spc:Patient. spc:BrainAtlasImage; rdf:type xnat:brainAtlasMsk rdf:type spc:Headscan; rdf:type xnat:brainAtlasImg xnat:patient1 rdf:type xnat:headscan2 spc:BrainAtlasMask; spc:Patient. spc:BrainAtlasImage; rdf:type xnat:brainAtlasMsk rdf:type spc:Headscan; rdf:type xnat:brainAtlasImg xnat:patient1 rdf:type xnat:headscan1 e:xetexp:BMG; spc:SemanticExpertExecution; sawsdl:modelReference sep:expert rdf:type exp:BMG; spc:SemanticExpertExecution; sawsdl:modelReference sep:expert rdf:type _:ebag. rdfs:member exp:TPM. exp:TS; exp:BR; exp:RBN; rdf:_5 exp:BMG; rdf:_4 rdf:_3 rdf:_2 _:pbag; rdf:_1 exp:TPM. rdfs:member rdf:_5 }"""^^sparql:GraphPattern]. }"""^^sparql:GraphPattern]. cfra spc:NRRD. spc:NRRD. dc:format dc:format xnat:patient1. spc:NRRD. spp:patient dc:format spc:NRRD. spc:NRRD. dc:format dc:format xnat:patient1. spc:NRRD. spp:patient dc:format . eatcAgent Semantic A.2 217

A Appendix A Appendix: Semantics

34 35 xnat:goal1 rdf:type sep:State, sep:GoalState, sep:AbstractState. 36 xnat:start1 rdf:type sep:State, sep:StartState, sep:GroundedState. Listing A.16: Example output of the semantic grounded planner based on the abstract plan for the example condition.

218