TERESA - 611153 - FP7/2013-2016

Deliverable D4.3: Navigation Demonstrator

Project Acronym: TERESA Project Full Title: Telepresence Reinforcement-Learning Social Agent Grant Agreement no. 611153

Due date: M36: November 2016 Delivery: November 30, 2016 Lead partner: UPO Dissemination level: Public Status: Submitted Version: v2.0 DOCUMENT INFO

Date and Version Number Author Comments 01.11.2016 v0.1 Luis Merino Scheme 10.11.2016 v0.2 Noe Perez Macro actions and social nav- igation 14.11.2016 v0.3 Rafael Ramon Approach people section 22.11.2016 v0.4 Jesus Capitan Walking side by side section 25.11.2016 v1.0 UPO team First draft 28.11.2016 v1.1 Joao Messias Revision 30.11.2016 v2.0 UPO team Submitted Contents

1 Contributors ...... 7 2 Executive summary ...... 8 3 TERESA human-aware navigation stack ...... 9 3.1 Introduction ...... 9 3.2 Robot sensors for navigation ...... 10 3.3 The navigation stack architecture ...... 11 3.3.1 Behavior manager ...... 12 3.3.2 Architecture of navigation behaviors ...... 13 4 Social waypoint navigation ...... 16 4.1 Introduction ...... 16 4.2 planning ...... 16 4.3 Low-level control ...... 17 4.4 Learning social navigation ...... 17 4.4.1 Learning a RRT* cost function ...... 18 4.4.2 Features for social navigation ...... 21 4.4.3 Experimental results ...... 23 4.5 Navigation evaluation ...... 25 4.5.1 Benchmarking according to ERL-SR ...... 25 4.5.2 Social evaluation ...... 27 5 Yield...... 31 6 Approaching people ...... 33 6.1 Introduction ...... 33 6.2 GMMs for interaction modeling ...... 34 6.3 The reproduction planner ...... 35 6.4 Experimental results ...... 35 6.4.1 Gathering the data for learning ...... 36 6.4.2 Metrics ...... 36 6.4.3 Results ...... 37 7 Walking side-by-side ...... 40 7.1 Introduction ...... 40 7.2 Preliminaries ...... 41 7.2.1 Social force model ...... 41 7.2.2 Partially observable Monte-Carlo planning ...... 43 7.3 Hierarchical planner for walking side-by-side with a person ...... 45 7.4 Low-level controller based on SFM ...... 46 7.5 High-level planner based on POMCP ...... 47 7.5.1 States ...... 47 7.5.2 Actions ...... 48 7.5.3 Observations ...... 48 7.5.4 Reward function ...... 48 7.5.5 Transition and observation probability functions ...... 48

2 TERESA - 611153

7.5.6 Other issues ...... 49 7.6 Experiments ...... 50 8 Integrated Experiments ...... 54 8.1 Centre Sportif de l’Aube ...... 54 8.2 Les Arcades ...... 55 9 Conclusions ...... 56

D4.3: Navigation Demonstrator Page 3 of 62 List of Figures

1 The final version of the TERESA robot navigating among people ...... 9 2 Final disposition of sensors on TERESA ...... 11 3 SMACH viewer representation of the human-aware navigation macro-actions . . 13 4 Finite state machine for navigation ...... 14 5 General architecture for navigation macro-actions ...... 14 6 Descriptive capture of the navigation planning architecture ...... 17 7 General cost function learning scheme ...... 20 8 Features for social navigation ...... 22 9 Social cost as Gaussian mixture function ...... 22 10 Scenarios for learning cross-validation ...... 24 11 Evolution of the parameters during learning ...... 24 12 Visual comparison of demonstrations and learnt paths ...... 24 13 Relative errors of the learnt paths ...... 25 14 Scenarios for navigation evaluation ...... 26 15 Scenario and Waypoints for navigation functionality evaluation ...... 26 16 Capture of the real experiments for social navigation evaluation ...... 29 17 Static scenario for social navigation evaluation ...... 30 18 Path of the dynamic scenario for social navigation evaluation ...... 32 19 Pre-defined yield behavior in doorways ...... 33 20 Demonstrated trajectories for the approaching task...... 34 21 A view of the general set up performed to collect the data...... 36 22 Costs evolution of GMM-RRT* approaches ...... 37 23 Planning paths vs. demonstrated paths ...... 38 24 Managing homotopies...... 38 25 Generalization...... 39 26 Social Force Model ...... 42 27 Walking side-by-side simulated environment ...... 50 28 Simulated results for the walking side-by-side behavior (I) ...... 51 29 Simulated results for the walking side-by-side behavior (II) ...... 52 30 Navigation experiments at Centre Sportif ...... 54 31 Pictures of experiment 4 at Les Arcades ...... 55 32 Approaching an interaction target ...... 56 33 Navigation to waypoint ...... 57

4 List of Tables

1 Rockin evaluation metrics for each waypoint ...... 27 2 Rockin evaluation metrics ...... 27 3 Results for social navigation evaluation with a static scenario with two people . . 31 4 Results for social navigation evaluation with a static group of two people . . . . . 31 5 Results for social navigation evaluation of a free run with two people moving in the scenario...... 32 6 Trajectory quality approaching a person. Smaller values are better for all metrics. The best values are highlighted in boldface...... 38 7 Parameters for the walking side-by-side algorithms ...... 51

5 List of Algorithms

1 RRT*-IRL ...... 21 2 SFM-based controller for robot navigation ...... 46

6 TERESA - 611153

1 Contributors

The authors of the deliverable are the members of the UPO team. Noé Pérez has written mainly Sections 4 and 5 and contributed to Section 3. Rafael Ramón has delivered Section 6. Jesús Capitán and Ignacio Pérez have mainly written Section 7. Luis Merino has written the introduction and conclusions and part of Section 3, and contributed, with Jesús Capitán, to the rest of the deliverable. The deliverable has been reviewed by Joao Messias of UvA.

D4.3: Navigation Demonstrator Page 7 of 62 TERESA - 611153

2 Executive summary

This deliverable is the outcome of Tasks 4.2 and 4.3 of the project. The demostrator itself is the navigation stack that has been deployed on the TERESA robot, and that has been tested, at least partially, in the experiments of the project, and in several local experiments. However, as there is no other document describing the details of some parts of the navigation stack, this document will not only show some results of the demonstrator, but it will also present the main techniques behind the navigation modules. With Deliverables D4.1 [9] and D4.2 [10], it completes the description of the main elements developed in WP4. The document first states in Section 3 the objectives of the navigation stack, moving from a classical navigation module to a human-aware navigation module, in which the presence of persons in the environment is considered at the different levels of the system. As it is not possible to devise a navigation module able to deal with all potential social situations, a set of navigation macro-actions are considered, and implemented, coordinated by a high-level behavior manager, that also integrate them with the body-pose macros described in [12]. Then, the particularities of the different macros are presented, and evaluated in local experi- ments. The human-aware navigation macro is described in Section 4. There, the main aspects of the motion planning and control elements are described, and how cost functions, learned from data, are considered to incorporate preferences related to normative social behavior. Section 6 describes the aspects of the approaching persons macro. There, it is shown how social constraints related to this task can be also extracted from data and incorporated into the previous motion planners. Section 7 describes a walking side-by-side macro, in which the robot should accompany one person towards a destination. First, a control-based approach considering a social force model is described. However, this task is a joint task between the person and the robot, in which the intentions and commitment of the person is not observable; also, this macro is particularly affected by uncertainties due to errors on the perception. Thus, the macro is implemented using a planner considering uncertainties to take into account these issues. Finally, Section 8 shows some results of the navigation stack integrated with the rest of the TERESA system in the Experiments at Les Arcades, Troyes. In these experiments, the per- ception modules developed in WP2, the HRI in WP3, the body-pose control of WP5 and the navigation of WP4 work together to provide a semi-autonomous telepresence system with so- cial intelligence.

D4.3: Navigation Demonstrator Page 8 of 62 TERESA - 611153

Figure 1: The final version of the TERESA robot navigating among people.

3 TERESA human-aware navigation stack

3.1 Introduction

Telepresence systems allow a human controller (the visitor) to interact remotely with people. Called by some "Skype on a stick", in such systems the visitor pilots a remotely located robot that results in greater physical presence than with standard teleconferencing. One of the po- tential problems of telepresence systems is the cognitive overload that arises by having to take, at the same time, navigation and interaction decisions. This may lead to mistakes at navigation and to pay less attention to the interaction and communication tasks [60]. Actually, partial autonomy in terms of navigation is a feature requested by telepresence users accord- ing to studies like in [21] and the findings of Deliverable D6.1 [15] on the user studies for the requirements of TERESA. In TERESA, the objective is to augment the social intelligence and autonomy of the telep- resence robot, so that the robot can execute the low-level navigation tasks and the user can concentrate in the interaction with his/her peers. Thus, the original Giraff telepresence robot has been upgraded throughout the project with new sensors for autonomous navigation, as described in Deliverables D6.3, 6.4 and 6.5 [16, 17, 18]. We will denote this new system the TERESA robot (see Fig. 1). Once the robot has the proper sensors, it is possible to deploy an autonomous navigation system so that the robot is able to provide autonomous waypoint navigation. One common option is the move base1 software package, which has been used in previous projects [29]. Many other options have been shown to work well in indoor environments (see, for instance, [3]). However, the TERESA robot is devised to work in and for the main task of interaction. There- fore, enhancing the autonomy of the telepresence robot in terms of navigation involves not only ensuring a safe and efficient navigation; the robot should intelligence when performing its tasks as well. Social intelligence means that robots must respect human social conventions (for instance, approaching a person should be performed in a socially appropriate

1http://wiki.ros.org/move_base

D4.3: Navigation Demonstrator Page 9 of 62 TERESA - 611153 manner) and guarantee the comfort of surrounding persons. This problem is usually called human-aware navigation or social navigation. A couple of extensive surveys on human-aware navigation can be found in [53, 39]. This problem involves different fields as human percep- tion, cognitive models and motion planning. The TERESA navigation stack integrates these elements. First, social intelligence requires perception modules that provide human awareness, that is, information about people in the environment, at least their pose. The person pose estimation module for the TERESA robot is described in Deliverable D2.2 [7]. Moreover, social navigation requires additional spatial information besides maps for localiza- tion and motion planning. For instance, spatial social affordance maps extend occupancy grid maps to predict potential trajectories of persons and/or the occurrence of events like appear- ance and disappearance of persons [43, 45, 34]. In TERESA, models for estimating potential destinations of persons, their likely motions and how to learn this from data have been devel- oped and described in Deliverables D4.1 [9] and D4.2 [10]. Furthermore, human-awareness should be considered at all levels of the navigation stack, from task planning to motion planning. At the motion planning level, the problem is usually tackled by including costs and constraints related to social navigation into the motion planners to obtain socially acceptable paths [57, 36]. In those cases, these costs are pre-programmed. However, hard-coded social behaviors may be inappropriate [22]. In other cases (for instance [37, 49]), these costs are based on Proxemics theory [28]. However, as shown in [42], proxemics is fo- cused on scenarios in which people interact, and it could not be suitable for navigating among people. It is in general easier to demonstrate good social behaviors than to formalize them, and thus, learning these social costs and constraints from data seems a more grounded ap- proach. In TERESA, the learning of the social cost functions for navigation is described mainly in Deliverables D5.1 [11], D5.3 [12], D5.5 [14] and this deliverable; and the extraction of social constraints from data in Deliverable D4.2 [10]. Finally, human-aware navigation involves not only considering humans in a different way as other "obstacles" through the previous costs and constraints, but also reasoning about their intentions and reacting to them [2]. For instance, in applications like robot guidance or meeting with a person, robots have to consider people’s goals and commitment in order to actuate in advance [44]. Moreover, robots may want to avoid certain places when humans intend to go not to disturb them [58]. Thus, the previous models have to be also considered by the planners when the robot takes navigation decisions. The TERESA navigation stack integrates all these elements, and it is briefly described in the next sections.

3.2 Robot sensors for navigation

The TERESA robot is equipped with several sensors that provide appropriate environment measurements to the software stack. Thus, some of them are used for safe navigation (i.e., obstacle avoidance), whereas others are used for retrieving information about the social scene around. Figure 2 shows the location the sensors on the robot frame. We have defined 2 main groups to cluster them by their aim, namely navigation and interaction sensors: LIDARs 2 Two scanning laser rangefinders for obstacle detection and localization of the robot.

2https://www.hokuyo-aut.jp/02sensor/07scanner/ust_10lx_20lx.html

D4.3: Navigation Demonstrator Page 10 of 62 TERESA - 611153

Figure 2: Final disposition of sensors on TERESA. The sensors used for navigation are the LIDARs (there is another one in the back, not shown here) and the XTion cameras.

They are located at the front and at the back of the robot and cover 360o around it. They are also used to obtain person detections. XTion cameras 3 These cameras use infrared sensors, adaptive depth detection technology and color image sensing. One of them is facing the floor with a certain angle, thus providing the navigation stack with readings that can be useful for 3D obstacle detection, such us steps or table’s boards. The other one, facing forward, is used also for person detection, so it helps to differentiate simple obstacles as furniture or walls from people. This camera is used for navigation and interaction purposes. Microphone 4 This sensor is used as a voice tracker and provides an omniderectional field of view to detect the active talker. It is used for interaction purposes. HD camera 5 It provides a high resolution video streaming which is analyzed in real time to detect the emotions that can arise while the robot performs its tasks. It is used for interaction purposes. The other devices depicted in Fig. 2 are aimed at enhancing the user experience, so they provide the users with video streaming (existing camera) and audio (speakers). Users can also interact with the robot by selecting the volume or through the yes/no buttons. Lastly, the charging of the robot can be done by plugging the manual charger or by means of a charging station that makes contact with the charger pads depicted in Fig. 2.

3.3 The navigation stack architecture

It is quite complicated to derive a single navigation function able to deal with all the human- aware navigation problems that can occur. And thus, many existing approaches only address one particular problem, like social people avoidance [31], people approach [59], following a person walking side-by-side [24], etc.

3https://www.asus.com/3D-Sensor/Xtion_PRO_LIVE/ 4https://www.acousticmagic.com/products/voice-tracker-ii-details/ 5http://www.teledynedalsa.com/imaging/products/cameras/area-scan/genie/CR-GM0X-H140X/

D4.3: Navigation Demonstrator Page 11 of 62 TERESA - 611153

We concur that it is very complex to derive general models, costs, planners, etc. for every potential situation, so our navigation architecture reflects this fact. Thus, the navigation stack offers a set of social macro-actions that can be triggered by the visitor of the robot or than can be executed autonomously when the social situation requires it. Each macro-action involves different models and algorithms. At the broadest level, these macro-actions are classified among Conversation macros and Navigation macros, derived from the analysis of social normative behavior performed in Deliv- erable D3.1 [8]. The conversation macros are mainly described in Deliverable D5.2 [12], while the navigation macros considered are the following: • Navigate to Waypoint, in which the robot navigates socially to the desired waypoint, and that is activated with a command from the visitor at the control station by selecting from a waypoint list. • Navigate to Interaction Target, in which the robot approaches a person to interact with his/her in a social way. The person is selected by the visitor at the control station by clicking the person on the image. This person is denoted as the Interaction Target. • Walk side-by-side, where the robot tries to accompany a person walking together to other room or place while in conversation. This is also triggered by the visitor. • Assisted steering This macro-action is activated whenever the visitor tries to drives the robot manually through his/her control interface. This assistant is in charge of evaluating the velocity commands sent by the visitor, checking possible collisions and generating smooth valid commands when possible (stopping in case of unavoidable collision). The previous macros are activated by the user of the robot (explicitly or implicitly). Moreover, there is another group of macro-actions that are triggered by events independent of the user’s actions: • Yield. While the robot is navigating, if any person stands on the robot’s path within a narrow space like a doorway, the robot will yield politely, retreating to a position that allows the person to come through first. Once the path is free again, the robot will resume its previous action. • Wait. This macro is activated if, while the robot is navigating, it finds its path blocked. In order to avoid a collision, the robot keeps standing still for a pre-defined time, trying to resume navigation after a waiting time. • Navigate Home. This macro-action is triggered whenever the battery level of the robot is under a certain threshold or the visitor ends the call. In that case, the robot should go back to a pre-defined position where the docking station is installed to charge (Home). The navigation architecture of TERESA has two levels of decision making. At the highest level there is a Behavior Manager in charge of selecting the different macro-actions and controlling the transitions between them; at the lower level, each macro-action represents a navigation behavior. In the following, these two levels are described.

3.3.1 Behavior manager

The Behavior Manager represents the top level decision-making component, and is in charge of controlling the navigation functionalities through the concept of macro-actions, which can be implemented in different manners, such as Finite State Machines (FSM), control-based,

D4.3: Navigation Demonstrator Page 12 of 62 TERESA - 611153

Figure 3: SMACH viewer representation of the human-aware navigation macro-actions. In green, the initial macro-action "Wait for goal". decision-theoretic-based, etc. The whole architecture is based on the ROS middleware6. Each macro-action must provide an interface based on the ROS actionlib 7 for the rest of the sys- tem, acting as services. These macro-actions are then activated when certain events happen and/or inputs are received from the visitor. The Behavior Manager is encoded as a Finite State Machine (FSM) that produces the adequate transitions between macro-actions. In particular, we have used SMACH 8 for the implementation. A screenshot of the whole FSM for navigation is depicted in Fig. 3, showing the complexity of the system. Figure 4 shows also the part of the general FSM diagram corresponding to the eight navigation macro-actions implemented, along with the available transitions between them according to the events that may occur. The initial macro-action is "Wait for goal", where the robot waits for an event that initiates a transition. The transition to the conversation states is done through the event "Nav succeeded".

3.3.2 Architecture of navigation behaviors

As said above, we have used ROS to implement the previous navigation macro-actions into a navigation stack that has been developed in the project 9. This stack substitutes completely the move_base ROS navigation stack. It follows the same architecture, but adding the components mentioned above for human-aware navigation. Figure 5 shows a simplified block diagram of the architecture. Many of the macro-actions mentioned above are implemented following a similar scheme. Besides receiving the information from the sensors and the perception modules, the main components are a global or high-level planner, that provides global paths or actions; and a

6http://www.ros.org 7http://wiki.ros.org/actionlib 8http://wiki.ros.org/smach 9https://github.com/robotics-upo/upo_robot_navigation

D4.3: Navigation Demonstrator Page 13 of 62 TERESA - 611153

Figure 4: Part of the diagram of the finite state machine corresponding to the human-aware navigation macro-actions. Each macro shows the events leading to it, and the events that provoke a transition to a different macro.

Interface

High-level controller commands Behavior Manager Social models features Models Models Global Planner (maps, social affor. maps, People poses (maps) social forces)

paths (macro-actions) Robot pose

Path Execuon Localizaon (Local Planner) (Social) Costs (Social) Constraints

Odometry Controls Lasers Other sensors

Figure 5: A sketch of the general architecture for the navigation macro-actions. On the left, the typical architecture for robot navigation, with a two-level planning/control architecture. The TERESA architecture is similar, but adding new models, social costs and constraints (right). Furthermore, the different modules are tailored to each particular macro-action.

D4.3: Navigation Demonstrator Page 14 of 62 TERESA - 611153 local planner/controller incorporating also dynamic obstacle avoidance. Each macro-action may involve different social cost/restrictions and/or different models or implementations of the planners. When the Behavior Manager activates a particular macro-action, the navigation stack loads the required modules and models. This fits the objectives of TERESA, and allows the social models, cost functions and planners to be learnt from data for that specific application. In the following sections, the particularities of the different macros are explained.

D4.3: Navigation Demonstrator Page 15 of 62 TERESA - 611153

4 Social waypoint navigation

4.1 Introduction

The first macro-action that was implemented is the navigation between waypoints in the sce- nario. As commented before, a human-aware navigation that respects the human social con- ventions is required. Thus, the idea is to extend the current path planners of the state of the art with social cost maps. As the social navigation task is easier to demonstrate than formalize it mathematically, a learning approach is considered with the aim at achieving a social navigation behavior. The macro in itself is implemented using a classical three-level architecture: a global plan- ner computes a global path, and then a local planning and a control layer deals with dynamic obstacle avoidance. First, the path planning architecture and the robot low-level control is ex- plained. Then, the learning process devised to learnt social navigation behaviors is presented. Finally, the results of an evaluation of the TERESA navigation system are shown.

4.2 Path planning

The current path planning system is based on the use of an Optimal Rapidly-exploring Ran- dom Trees (RRT∗) algorithm as a local planner [35]. This algorithm is extensively employed in robot planning. It is flexible and easily adapted to different scenarios and problems, and implicitly reasons about collisions with obstacles at moderate computational cost even in high dimensionality. RRT∗ can explore the state space to obtain optimal paths on cost . This way, we can change the behavior of the planner by adapting the cost functions involved. On top of it, an A* planner provides a global path towards the destination. The relation between the two levels is the following: • Firstly, the A∗ global planner is employed to obtain a path from the initial robot position to the final goal. The plan is calculated in the map frame over the static map of the scenario and the static obstacles not included in the map and the dynamic obstacles are not taken into account. This path is computed only once when a navigation goal is received. • Secondly, the RRT∗ planner uses the previous plan to determine a local goal. This sub- goal is determined as the intersection point of the A∗ plan and a local area around the robot (10x10m. approximately with the robot in the center). Then, the RRT∗ plan a local path on the robot frame towards this local goal taking into account all the people and static and dynamic obstacles in the vicinity of the robot. Moreover, the RRT∗ space sampling is partially biased to areas close to the A∗ plan. The sub-goal is updated and the RRT∗ path is re-planned at a frequency between 1 − 2 Hz. Figure 6 shows a descriptive example of this two-level planning architecture. Finally a low-level controller is employed to send the correct commands to the robot in order to follow the RRT∗ path. The use of the RRT∗ planner in the scheme presented has some advantages. In one hand, it plans in a fixed-size continuous space unlike the A∗ planner which needs a discretized space and its performance depends on the resolution of the discretization. On the other hand, the random component of the planner allows to learn and plan paths of different homotopies that are similarly valid for avoiding obstacles or people.

D4.3: Navigation Demonstrator Page 16 of 62 TERESA - 611153

Figure 6: Capture of the navigation simulator for describing the planning architecture. The dark green line represents the path obtained by the A∗ planner to the final goal. The light green line represents the RRT∗ path in a local area. The light yellow line represents the lasers sensors readings. The orange arrow represents the current local goal of the RRT∗ planner. Finally, the green cylinders marked as 2 and 4, represent people detected in the vicinity of the robot.

4.3 Low-level control

A controller based on pure pursuit path tracking has been extended to command velocities to the differential robot so as to follow the path smoothly. The control law used to find the correct velocities is shown in Equations (1) for the linear velocity (v) and (2) for angular velocity (ω).

−|θ| v = vmax ∗ e (1)

ω = ψav ∗ θ (2) where θ represents the angle (radians) between the current robot heading and the straight line ∗ that link the robot position and the next point of the RRT path to reach. vmax is the maximum linear velocity allowed in m/s and finally ψav is proportional gain for the angular velocity taking into account that the velocity will be in the range [0, (ψav ∗ π)] (in absolute values). This basic path tracker runs at a frequency of 15 Hz and has been extended to perform a collision detection checking similar to the Dynamic Windows Approach algorithm [25]. If the forward projection of the robot movement given by the control law is detected as a possible collision, a valid command is tried to be found by sampling small variations of the given angular velocity. If a possible collision is still detected, a rotation in place in the correct direction is performed in order to avoid very close obstacles. Moreover, a linear decrease of the velocities when approaching the goal has been also added.

4.4 Learning social navigation

As commented in the Introduction, one way to incorporate human-awareness into the motion planner is to include social cost layers that can rank those motions that are more socially

D4.3: Navigation Demonstrator Page 17 of 62 TERESA - 611153 acceptable. Instead of engineering such cost layers, it is more natural to derive them from examples of good navigation behaviors. To this end, we have been collaborating with WP5 to derive from examples the cost layers and integrate them into the planner. Here we present a novel approach for learning robot navigation behaviors from demonstrations has been developed. We make use of Maximum Entropy Inverse Reinforcement Learning (IRL) concepts and sampling-based planners (in particular RRT∗ [35]) to identify the RRT∗ cost function that better fit the demonstration trajectories. Differently to classic IRL approaches based on Markov Decision Process (MDPs), the presented method is computationally faster and scales very well with the state size, being able to deal with continuous state and control spaces, and it is general enough to be applied in different scenarios. This work has been published in [50] and can be consulted for further information. In WP5 (D5.3 and D5.5 [13, 14]), these ideas are analyzed in more detail, and new approaches for IRL with sampling-based planners are presented.

4.4.1 Learning a RRT* cost function

RRT∗ [35] is a technique for optimal motion planning. It considers that a cost function is asso- ciated to each point x in the configuration space. The algorithm seeks to obtain the trajectory ζ∗ that minimizes the total cost along the path C(ζ). It does so by randomly sampling the configuration space and creating a tree towards the goal. The paths are then represented by a finite set of configuration points ζ = {x1, x2, ··· , xN }. Without loss of generality, we can assume that the cost function for each point can be ex- pressed as a linear combination of a set of functions, that will be called features c(x) = P T j ωjfj(x) = ω f(x). The cost of a path is then the sum of the cost for all points in the path. Particularly, in the RRT∗, the cost of a path is the sum of the sub-costs of moving between pairs of points in the path:

N−1 X c(xi) + c(xi+1) C(ζ) = kx − x k (3) 2 i+1 i i=1 N−1 X f(xi) + f(xi+1) = ωT kx − x k = ωT f(ζ) (4) 2 i+1 i i=1 | {z } f(ζ)

Thus, for given weights ω, the algorithm will return trajectories that try to minimize this cost.

Given a set of demonstration trajectories D = {ζ1, ζ2, ··· , ζD}, the problem of learning from demonstrations, in this setup, means to determine the weights ω that lead our planner to be- have similarly to these demonstrations. The concept of similarity in this context is ambiguous. One may want to exactly reproduce the same trajectories in the same situations. However, it is necessary to learn a representation that can generalize to other situations. As in [1, 40], this can be achieved by using the mentioned features as a measure of similarity. The objective is then to model the underlying trajectory distribution of the expert p(ζ|ω) with the constraint that the expected value of the features for the trajectories generated by the model is the same as the expected value of the features for the given demonstrated trajectories:

D4.3: Navigation Demonstrator Page 18 of 62 TERESA - 611153

D 1 X (f(ζ)) = f(ζ ) (5) E D i i=1

There are many distributions p(ζ|ω) that achieve the previous constraint. Applying the Maxi- mum Entropy Principle [61, 38] to the IRL problem leads to the following form for the probability density for the trajectories returned by the demonstrator:

1 T p(ζ|ω) = e−ω f(ζ) (6) Z(ω)

T where the partition function Z(ω) = R e−ω f(ζ)dζ is a normalization function that does not depend on ζ. Thus, this model assumes that the expert is exponentially more likely to chose a trajectory with lower cost ωT f(ζ). Under this model, the (log-)likelihood of the demonstrated trajectories is given by:

D X T L(D|ω) = −D log(Z(ω)) + (−ω f(ζi)) (7) i=1

The gradient of the previous log-likelihood with respect to ω is given by:

Z D D ∂L(D|ω) 1 T X 1 X = D f(ζ)e−ω f(ζ)dζ − f(ζ ) = D( (f(ζ)) − f(ζ )) (8) ∂ω Z(ω) i E D i | {z } i=1 i=1 E(f(ζ))

Setting this gradient to zero one arrives to the original constraint in (5). This means, as in- dicated in [41], that the IRL problem under the Maximum Entropy distribution is equivalent to maximizing the likelihood of the demonstrations when assuming an exponential distribution. Maximizing the likelihood (7) to obtain the set of weights cannot be solved analytically. But the gradient (8) can be used to apply gradient ascend techniques to solve this problem. As men- tioned in [40], this gradient can be intuitively explained. If the value of one of the features for the trajectories returned by the planner are higher than the value in the demonstrated trajectories, the corresponding weight should be increased to penalize those trajectories. The main problem with the computation of the previous gradient is that it requires to compute the expected value of the features E(f(ζ)) for the generative distribution (6).

1 Z T (f(ζ)) = f(ζ)e−ω f(ζ)dζ (9) E Z(ω) over the continuous space of trajectories. In [38], a probabilistic generative model for trajec- tories is derived from data, and the expectation is computed by Monte Carlo Chain sampling methods, which are very computationally demanding. One option is to approximate the previous expectation by the features of the most likely trajec- tory [40]. In our case, we will approximate the expert by the RRT* planner on board the robot. Being an asymptotically optimal planner, for some given weights, the RRT* will provide the trajectory that minimizes the cost (the most likely trajectory) given infinite time. As the planning time is limited, the RRT* will provide trajectories with some variability on the features, and thus

D4.3: Navigation Demonstrator Page 19 of 62 TERESA - 611153

Figure 7: General learning scheme proposed to adjust the weights of the cost function of a RRT∗ planner. the feature expectation E(f(ζ)) is computed by running several times the planner between the start and goal configurations. This is then used to compute the gradient and adapt the weights used in the RRT* planner. The experimental results will show that the method is able to recover the taught behaviors by the expert. Figure 7 shows the basic learning scheme proposed, described in more detail in Algorithm 1. The example trajectories from which we want to learn are used as input. The output of the method are the weights for the cost function of the RRT∗ algorithm in (3). 1 PD First, in Line 1 we obtain the average feature count f D = D i=1 f(ζi) from the example trajectories in D scenarios. The feature counts are obtained as the addition of the feature values of pairs of nodes of the trajectory evaluated similarly to Equation (3). Then we initialize the weights with a random value (Line 2). It is noteworthy that the weights are not being normalized during the learning iterations, so that changes in the value of one weight do not provoke the variation of the values of the rest of weights. They are normalized after the learning has finished. On the other hand, the features values for each node are normalized for the sake of clarity of representation but this is not a requirement of the algorithm. The key point is the gradient given by (8), which requires a comparison of the features counts obtained from the example trajectories and the expected value from the RRT∗ planner. The latter is obtained by running rrt_repetitions times the planner for the current weight values for each scenario considered (Line 6) and obtaining and normalizing the features counts (Line 7). In Lines 9 and 11 the averaged values are obtained. Based on this comparison the weights of the cost function are updated using exponentiated gradient descent (line 13), as in [61]:

D4.3: Navigation Demonstrator Page 20 of 62 TERESA - 611153

Algorithm 1: RRT*-IRL

1 S Require: Trajectory examples D = {ζ1 , . . . , ζD} in S scenarios T Ensure: Function features weights ω = [ω1, . . . , ωJ ] 1: f D ← calculeAvgF eatureCounts(D) 2: ω ← randomInit() 3: repeat 4: for each s ∈ S do 5: for rrt_repetitions do 6: ζi ← getRRT starP ath(s, ω) 7: f(ζi) ← calculeF eatureCounts(ζi) 8: end for s Prrt_repetitions 9: f RRT ∗ ← ( i=1 f(ζi))/rrt_repetitions 10: end for PS i 11: f RRT ∗ ← ( i=1 f RRT ∗ )/S 12: ∇L ← (f RRT ∗ − f D) 13: ω ← UpdateW eights(∇L) 14: until convergence 15: return ω

(λ/φ)∗∇Li ωi ← ωi ∗ e (10) where φ is the number of the current iteration of the algorithm, λ is an adjusting factor of the equation and ∇L = ∂L(D|ω) is the i-th component of the gradient. i ∂ωi Finally, the learning process finishes when the variations of the weight values keep under a certain convergence value .

4.4.2 Features for social navigation

The social navigation task considered here involves the robot navigation in different house environments like rooms and corridors where some persons stand in different positions so that the robot has to avoid them to reach the goal. A small set of well-known features have been considered here. They are the distance to the goal, the distance from the robot to the people in the scene, the angle of the robot position with respect to the people α, and the distance to the closest obstacle, as depicted in Figure 8. Thus, five functions based on the features presented are combined to obtain the cost function ∗ employed in the RRT planner. The function are computed for each sample xk of the con- figuration space. The first one is just the Euclidean distance from the robot position to the goal:

f1(xk) = kxk, xgoalk (11)

The second feature function uses the distance to the closest obstacle for each node xk, with the aim of motivating the robot to keep some distance from the obstacles. A distance transform function is employed to obtain the distance to the closest obstacle for each sample.

f2(xk) = kxk, xclosest_obs_kk (12)

D4.3: Navigation Demonstrator Page 21 of 62 TERESA - 611153

Figure 8: Features employed in the social cost function learned. d1, distance to the goal. d2, distance from the people to the robot. α, angle between the person front and the robot location. d3, distance to the closest obstacle.

Figure 9: Gaussian mixture function deployed over each person. The lateral bar shows the costs based on the color displayed.

D4.3: Navigation Demonstrator Page 22 of 62 TERESA - 611153

Then, other three feature functions representing a proxemics cost with respect to the persons in the environment are employed. These cost functions are defined by Gaussian functions and are inspired in the model used by Kirby et al. [37]. One Gaussian function is placed in the front of the person, other one in the back, and the last one in the right side of the person. The shape of the front and back Gaussian functions can be seen in Fig. 9. These cost functions p depend on the distance (djk) and relative angle (αjk) of the robot position xk with respect to each person j in the scenario. The costs due to all persons in the scenario are integrated according to the next expressions, where P is the group of people:

f3(xk) = max{pfront(djk, αjk), ∀j ∈ P} (13)

f4(xk) = max{pback(djk, αjk), ∀j ∈ P} (14)

f5(xk) = max{pright_side(djk, αjk), ∀j ∈ P} (15)

Figure 9 shows the front and back cost functions for one person, which are implemented as two Gaussian functions: the first function is asymmetric and placed in the front of the person with σh = 1.20 (meters) the variance in the direction the person is facing, and a smaller variance in the sides σs = σh/1.5. The second Gaussian is placed in the back of the person with σh = σs = 0.8. Finally, a third Gaussian is placed in the right side of the person with σh = 0.8 and σs = 0.8/2.5. This function can be useful in corridor scenarios where the people usually tend to walk closer to the the wall on their right. The values of the n (5 in this case) feature functions are normalised and the cost function for Pn each node xk is built adding its weighted values c(xk) = i=1 ωifi(xk) where ωi ∈ [0, 1] and P i ωi = 1. Finally, the total cost along the N nodes of the path ζ is obtained based on the motion-cost function employed by the RRT∗ algorithm to calculate the cost of moving from one node to the next one according to Equation (3).

4.4.3 Experimental results

A set of experiments have been performed to evaluate whether the algorithm is able to recover the characteristics of the taught trajectories. All the presented experiments were performed by using a library of RRT algorithms developed by the authors for research purposes. The library is available in the Github of the Service Robotics Lab10 under BSD license. The hardware employed was an i7 processor 3770 with 12 GB DDR3 memory, where the planner was allowed to plan a path for 2 seconds. In these experiments, we use the RRT∗ with a set of known weights in the cost function to generate the example trajectories in a set of scenarios as ground truth. Then, we use these trajectories to learn the cost function with the proposed algorithm in the same configurations. Particularly, we employed 25 different configurations (different initial robot position, goal po- sition and different number of persons and positions) in different parts or rooms in a house map. Moreover, for each configuration, 25 RRT∗ example trajectories were recorded. To vali- date the approximation, the trajectories from 15 of these configurations were used to learn the weights of the cost function, and the 10 remaining configurations were employed to compare the resulting paths. Figure 10 shows 3 of the configurations used in the validation. Figure 11 shows the evolution of the normalised weights values, feature counts and gradi- ents along the iterations of the learning algorithm. As can be seen, the weights converge

10https://github.com/robotics-upo

D4.3: Navigation Demonstrator Page 23 of 62 TERESA - 611153

Figure 10: Some of the scenarios employed in the cross-validation process. A coloured costmap based on the RRT∗ function cost is also shown.

FEATURE COUNTS GRADIENTS WEIGHTS NORMALISED 2.5 demo f1 1 0.8 rrt f1 g1 w1 demo f2 g2 w1 learnt rrt f2 g3 0.7 demo f3 0.8 w2 2 w2 learnt rrt f3 0.6 w3 w3 learnt 0.6

0.5 1.5 0.4 0.4

0.2 0.3 1

0.2 0 0.5 0.1 −0.2

0 0 10 20 30 40 50 60 0 −0.4 Iterations 0 10 20 30 40 50 60 0 10 20 30 40 50 60 Iterations Iterations (a) Weights (b) Feature counts (c) Gradients

Figure 11: (a) Evolution of the weights during the learning iterations. (b) Evolution of feature counts. (c) Gradients.

Path comparison Path comparison Path comparison Path comparison 21 20 21 22 Demo paths Demo paths Demo paths Demo paths FC diff paths FC diff paths FC diff paths FC diff paths 20 20 19 21

19 19 18 20

18 y y 18 y 17 y 19 17

17 16 18 16

16 15 17 15

15 14 14 16 7.5 8 8.5 9 9.5 10 10.5 7.5 8 8.5 9 9.5 10 10.5 11 11.5 7.5 8 8.5 9 9.5 10 10.5 11 11.5 8 8.5 9 9.5 10 10.5 11 x x x x

Figure 12: Visual comparison of the demonstration paths (red lines) and the RRT paths ob- tained by using the weights learnt (blue lines). Trajectories 1, 5, 7 and 10 are presented from left to right. to values close to the ground-truth ones in few iterations committing a final error around the 16%. The difference in the feature counts expectations, which is the optimization objective in (5), quickly approaches zero, and so the gradients and the weights are stabilized. The rela- tive error committed in the weights learnt respect to the ground-truth weights is calculated as REω = kω¯D − ω¯RRT ∗ k/kω¯Dk = 0.1620. Once the learning has finished, we can compare the demonstration paths and the RRT∗ paths using the weights learnt in the remaining configurations for cross-validation. A qualitative com- parison of the paths can be seen in the Figure 12 for four of these trajectories (the rest are omitted for the sake of brevity). It can be seen that the behavior is very well reproduced in all the cases. We can also compare the ground truth costs of the demonstration paths and the learnt RRT∗ paths. Figure 13 shows the averaged relative errors in the costs and in the feature counts. The error in feature counts is under the 8% in all the cases. On the other hand, the error in the costs is even lower being all the cases under the 4%. Moreover, it can be also seen in Fig. 12 that some of the trajectories with larger cost error (1 and 10) reproduce very well the demonstrations.

D4.3: Navigation Demonstrator Page 24 of 62 TERESA - 611153

0.06 0.14

0.12 0.05

0.1 0.04 0.08 0.03 0.06

0.02 0.04 Relative Errors in the feature counts Relative Errors in the costs of paths 0.01 0.02

0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Scenarios Scenarios (a) Path cost (b) Feature counts

Figure 13: (a) Relative errors in the costs of the demonstration paths and the RRT∗ paths using the learnt weights. (b) Relative errors in the feature counts.

4.5 Navigation evaluation

In this section, the waypoint navigation macro will be evaluated by performing real experiments in the laboratory. First, the navigation functionality will be assessed with an evaluation inspired by the benchmark of the European Robotic League for Social Robots (ERL-SR) used in Rock- In competition11. Secondly, a evaluation of the "social" capabilities of the system will be studied. In both cases, a comparison with a classical navigation stack where all entities in the scene are simple obstacles is performed. To do that, the move_base framework from ROS is employed12.

4.5.1 Benchmarking according to ERL-SR

According to the ERL-SR Rulebook13, the navigation functionality benchmark assesses the robot’s capability to correctly, safely, and autonomously navigate in a ordinary apartment. The task includes: the navigation in a apartment-like environment with furniture, walls, and doors, i.e. in a previously mapped area; avoiding collisions with different type of unknown obstacles, in unknown positions (not previously mapped); and navigate in the presence of people in the arena. The robot receives a list of waypoints it has to follow in a respective order. The proposed testbed of the Rockin Home Competition is depicted in Figure 14. A similar apartment environment has been replicated at the Universidad Pablo de Olavide (Figure 14, right). In the living room area some unmapped obstacles have been placed. Also the kitchen area counts with the presence of people wandering around. A set of waypoints have been defined in the scenario, as indicated in the Figure 15. The robot has to reach each waypoint following the order established. The evaluation of the navigation takes into account the following three metrics: • The distances between the robot’s position and the respective position of the waypoint (both the Euclidean distance between the waypoint and the robot in meters (Pxy) and the difference in the orientation in radians (Pyaw)). • The time spent by the robot to go from each waypoint to the next waypoint in seconds (Tp). • The number of times that the robot hits each obstacle (Hits).

11http://rockinrobotchallenge.eu/ 12http://wiki.ros.org/move_base 13http://sparc-robotics.eu/wp-content/uploads/2016/04/ERL_Service.compressed.pdf

D4.3: Navigation Demonstrator Page 25 of 62 TERESA - 611153

Figure 14: Scenarios for navigation evaluation. Left, testbed of the Rockin Home Competition. Right, testbed in the Pablo de Olavide University.

Figure 15: Waypoints (green dots) employed for navigation functionality evaluation. An exam- ple of the paths followed by the TERESA system (red line) and the move_base system (blue line) are also showed.

We also have added and extra metric that is the total path length in meters (Lp). Three runs are carried out with each one of the systems. Table 1 shows the metrics (in average) with their

D4.3: Navigation Demonstrator Page 26 of 62 TERESA - 611153

Table 1: Rockin evaluation metrics for each waypoint

W1-W2 W2-W3 W3-W4 W4-W5 W5-W6 W6-W7 W7-W1 Tp 17.73 ± 1.70 14.95 ± 8.03 28.89 ± 12.36 31.38 ± 18.51 19.07 ± 14.70 23.57 ± 6.10 15.61 ± 4.31 Pxy 0.15 ± 0.03 0.39 ± 0.25 0.13 ± 0.02 0.16 ± 0.02 0.27 ± 0.22 0.22 ± 0.08 0.25 ± 0.03 TERESA Pyaw 0.25 ± 0.15 0.16 ± 0.14 0.29 ± 0.11 0.36 ± 0.03 0.16 ± 0.07 0.29 ± 0.09 0.17 ± 0.14 Lp 5.10 ± 0.11 4.65 ± 2.80 5.58 ± 3.33 6.32 ± 2.88 4.46 ± 0.68 4.89 ± 3.20 4.17 ± 0.83 Hits 0 0 0 0 0 0 0

Tp 24.47 ± 13.11 14.25 ± 0.14 18.43 ± 13.84 17.82 ± 4.89 18.03 ± 9.98 73.48 ± 25.85 25.96 ± 19.07 Pxy 0.10 ± 0.03 0.09 ± 0.00 0.10 ± 0.00 0.37 ± 0.43 0.10 ± 0.01 0.35 ± 0.02 0.07 ± 0.06 MOVEBASE Pyaw 0.09 ± 0.09 0.32 ± 0.09 0.22 ± 0.01 0.19 ± 0.18 0.30 ± 0.14 0.20 ± 0.28 0.30 ± 0.03 Lp 3.17 ± 0.24 5.00 ± 0.05 3.95 ± 0.85 5.56 ± 3.24 5.17 ± 2.62 6.82 ± 2.51 5.12 ± 2.86 Hits 0 0 0 0 0 0 0

Table 2: Rockin evaluation metrics

Tp Pxy Pyaw Lp Hits TERESA 151.20 ± 9.38 0.22 ± 0.14 0.24 ± 0.12 35.17 ± 0.17 0 MOVEBASE 192.44 ± 20.78 0.17 ± 0.18 0.23 ± 0.13 34.79 ± 0.17 0 standard deviations of the three runs for each of the paths between waypoints. As can be seen, there are not very significant differences between the two navigation systems. As expected the time to reach the goal for the most of trajectories is a bit shorter in the case of move_base since it tries to minimize the distance and time to reach the goal unlike the TERESA system which also takes into account the people comfort. Anyway, the differences are small. Moreover, the precision on reaching the waypoints is slightly better in the case of move_base by a small margin. Both navigation systems successfully reached all the goal without any hits. Table 2 summarizes the results of three runs for all the paths performed with each system (TERESA navigation system and ROS Move base system). The precision reaching the way- points in distance is slightly better in the case of the move_base. However the TERESA sys- tem has a better total average time to finish the trial. We observed during the trials that the move_base system spent more time rotating and trying to reach the correct goal position than TERESA. We also noted that move_base has some problem when the robot is surrounded by people or unknown obstacles because the A∗ planner is not able to find a path to the goal. The TERESA’s RRT ∗ local planner is more efficient for local obstacle avoidance. In terms of navigation functionality, we can say that the performance of both systems is quite similar.

4.5.2 Social evaluation

We want also evaluate the social capabilities of the TERESA navigation system. To do that, we will compare again with the ROS move_base system, which has no social reasoning. The evaluation tests have been recorded in the Robotics Lab of the Pablo de Olavide University. This lab has an approximate dimensions of 4.60 m x 7.60 m and counts with an Optitrack Motion Capture System installed which has been used to record the experiments and allows to have a good precision of robot and people positions and orientations. The set of metrics considered for a social evaluation are:

• Time to reach the goal (Tp). Time in seconds since the robot start the navigation until the goal is correctly reached.

D4.3: Navigation Demonstrator Page 27 of 62 TERESA - 611153

Tp = (Tgoal − Tini) (16)

• Path length (Lp). The length of the path followed by the robot from the initial point to the goal position (meters).

N−1 X i i+1 Lp = kxr − xr k2 (17) i=1

• Cumulative heading changes (CHC). It counts the cumulative heading changes of in the robot trajectory measured by angles (radians) between successive waypoints. It gives a simple way to check on smoothness of path and energy [48] so a low value is desirable.

N−1 X i i+1 CHC = (hr − hr ) (18) i=1 i Where hr indicates the heading of the robot in the position i. The angles and their difference are normalized between −PI and PI.

• Average distance to closest person (CPavg). A measure of the mean distance from the robot to the closest person along the trajectory (meters).

N 1 X CP = (kxi − xi k ) (19) avg N r cp 2 i=1 i Where xcp indicates the position of the closest person to the robot at step i.

• Minimum and maximum distance to people (CPmin and CPmax respectively). The values of the minimum and the maximum distances in meters from the robot to the people along the trajectory. It can give an idea of the dimension of the robot trajectory respect to the people in the space.

i i CPmin = min{kxr − xcpk2 | ∀i ∈ N} (20)

i i CPmax = max{kxr − xcpk2 | ∀i ∈ N} (21)

• Personal space intrusions (CPprox). This metric is based on the Proxemics theory which define personal spaces around people for interaction [28]. These areas are defined as: – Intimate. Distance shorter than 0.45 m. – Personal. Distance between 0.45 m and 1.2 m. – Social. Distance between 1.2 m and 3.6 m. – Public. Distance longer than 3.6 m.

D4.3: Navigation Demonstrator Page 28 of 62 TERESA - 611153

Figure 16: Capture of the real experiments for the social navigation evaluation in the University Pablo de Olavide.

Thus, the metric classifies the distance between the robot and the closest person at each time step in one of the Proxemics spaces (in our case we use only three: Intimate, Personal and Social+Public), and obtain a percentage of the time spent in each space for the whole trajectory:

 N  1 X CP k = F(kxj − xj k < δk) ∗ 100 (22) prox N r cp 2  j=1

where N is the total number of time steps in the trajectory, δ defines the distance range for classification defined by k = {Intimate, P ersonal, Social + P ublic}, and F(·) is the indicator function. For the waypoint navigation task, the desirable behavior should lead the robot to spend the most of the time in the Social or Public distance range but this depends on the dimensions of the space and the density of people in the area. So, for a small area shared with people, intrusions in the Personal area are acceptable whereas for a big open space the robot should stay in the Social and Public distances. In any case, the intrusions in the Intimate space should be avoided.

• Interaction space intrusions (ISprox). This metric is inspired by the work of Okal [48] in formalizing social normative robot behavior, and it is related to groups of interact- ing persons. It measures the percentage of time spent in the three Proxemics spaces considered with respect to an interaction area formed by a group of people that are in- teracting each other. The detection of the interaction area of the group is based on the detection of F-formations. A F-formation arises whenever two or more people sustain a spatial and orientational relationship in which the space between them is one to which they have equal, direct, and exclusive access [55, 20].

 N  1 X ISk = F(kxj − xj k < δk) ∗ 100 (23) prox N r f 2  j=1

j where xf determines the center of the closest formation or group of people f to the robot at step j. Two sets of experiments have been performed. One in a static scenario and another one in a dynamic scenario. In the case of the static scenario we consider two different situations: first, a group of two people talk to each other in the middle of the scene, as can be seen in Figure

D4.3: Navigation Demonstrator Page 29 of 62 TERESA - 611153

TERESA path MOVEBASE path TERESA path people MOVEBASE path people

Figure 17: Static scenario for social evaluation along with some of the paths obtained by the TERESA system and move_base system. Left. Two people looking and talking each other (forming a group) represented by magenta circles with the blue arrow indicating the orientation. Right. Two people facing the walls (not a group). The red line shows the path followed by the TERESA system. The blue line shows the path of the move_base system.

16; second, two people are in the scenario but there are not interacting between them. Figure 17 shows two examples of the scenarios proposed where the people position and orientation are shown. Moreover the paths followed by the robot by using the TERESA navigation system or the move_base system are also depicted. As can be seen, the TERESA system avoid to interfere with the group showing a more acceptable behavior from the social point of view. Tables 3 and 4 shows the metrics with their standard deviations obtained for four runs of each navigation system in the static scenario with a group of people and with two individual people respectively. On one hand, as can be observed in the case of two people not forming a group, the metrics do not present many significant differences (note that the ISprox metric is not applicable be- cause there is not interaction space). The small space available to perform the navigation and the people positions do not allow to obtain big differences in the metrics considered and the resulting paths are very similar, as Fig. 17 shows. In any case, we can remark some metrics. The minimum distance to people is higher in the case of the TERESA system which indicates that the system always was able to keep some distance to the person. Moreover, the TERESA system was able to spend less time in the Personal space in favour of the time in the Social space which indicates a better social behavior. On the other hand, the case of a group of people presents even clearer differences. The move_base system performs clear intrusions in the Intimate space with respect to the closest person and also to the interaction space of the group. Again, the minimum distance to people is higher in the case of TERESA. These results show that the TERESA navigation system is more respectful with the people spaces and the interaction space of the group from a social point of view. Another test has been recorded in a dynamic situation. In this case, two people are walking freely in the scenario forming a group sometimes or walking alone. At the same time, the robot is receiving waypoints that lead it to cross the room avoiding the people around. One long run

D4.3: Navigation Demonstrator Page 30 of 62 TERESA - 611153

Table 3: Results for social navigation evaluation with a static scenario with two people

NO GROUP Tp Lp CHC CPavg CPmin CPmax TERESA 22.70 ± 1.52 4.86 ± 0.40 4.54 ± 0.79 1.74 ± 0.09 1.08 ± 0.16 2.47 ± 0.13 MOVEBASE 24.18 ± 1.73 4.92 ± 0.23 4.37 ± 0.40 1.90 ± 0.14 0.63 ± 0.07 2.82 ± 0.01

CPprox Intimate Personal Social+ TERESA 0.00 ± 0.00 12.77 ± 1.02 87.23 ± 0.87 MOVEBASE 0.00 ± 0.00 17.24 ± 0.98 82.76 ± 0.98

Table 4: Results for social navigation evaluation with a static group of two people

GROUP Tp Lp CHC CPavg CPmin CPmax TERESA 23.65 ± 0.64 5.45 ± 0.27 4.18 ± 2.55 1.87 ± 0.08 0.79 ± 0.17 2.71 ± 0.11 MOVEBASE 23.15 ± 1.34 5.02 ± 0.08 4.41 ± 0.08 1.88 ± 0.12 0.42 ± 0.15 2.74 ± 0.17

CPprox ISprox Intimate Personal Social+ Intimate Personal Social+ TERESA 0.00 ± 0.00 17.83 ± 1.52 82.17 ± 1.52 0.00 ± 0.00 0.00 ± 0.00 100.00 ± 0.00 MOVEBASE 2.60 ± 3.67 14.50 ± 3.06 82.90 ± 0.61 7.64 ± 0.70 13.00 ± 0.01 79.37 ± 0.71 for each navigation system was performed. Figure 18 shows the paths followed by the people in one of the runs (right) and the paths followed by the robot with the TERESA system and the move_base system (left). In this experiment the people try to cover all the space and perform different trajectories in order to record as many different navigation situations as possible. As a first result, only observing the left image of the Figure 18, it seems that the TERESA system tried to deal with the people adapting its paths to the people much more than the move_base system, which tried to perform the same direct trajectory to the goal many times.

Table 5 shows the metrics obtained in the experiment. In this case, the total time Tp indicates the total duration of the run being the recording of the TERESA case a bit longer than the move_base recording. Under this condition, the path length metric Lp indicates the total length travelled by the robot (all the trajectories), and, as can be seen, it was longer in the TERESA case. This fits with the paths shown in the Figure 18 left, and explains the higher value in the cumulative heading changes (CHC) since the TERESA system tried to adapt its paths to the people in the scene. The minimum distance to people (CPmin) was slightly higher with TERESA despite of the average distance to people (CPavg) was the same. On the other hand, again the most remarkable and significant metrics are the intrusions in the Proxemics spaces of the closest person (CPprox) and the interaction area (ISprox). The intrusions in the intimate spaces are much lower in the case of TERESA system, what states the effort on minimizing the disturbance of people. As a general result, we can say that the behavior of the TERESA system is more socially acceptable than the move_base.

5 Yield

The robot should yield the way to people when crossing through narrow places like doorways or corridors at the same time (this event is what we called no social path available). If any person is crossing the path of the robot in one of those narrow areas, the robot should retreat to a position that allows the person to come through first. Once its path is free again, the robot resumes with its previous navigation.

D4.3: Navigation Demonstrator Page 31 of 62 TERESA - 611153

TERESA path MOVEBASE path

Figure 18: Representation of the paths followed in the dynamic free run experiment. Left: paths followed by the robot using the TERESA and move_base navigation systems. Right: Paths followed by the two people in the scenario.

Table 5: Results for social navigation evaluation of a free run with two people moving in the scenario.

Tp Lp CHC CPavg CPmin CPmax TERESA 495.50 70.24 89.16 2.50 0.31 4.65 MOVEBASE 387.80 50.18 74.34 2.50 0.27 4.84

CPprox ISprox Intimate Personal Social+ Intimate Personal Social+ TERESA 0.60 14.31 85.08 3.31 12.27 84.42 MOVEBASE 3.17 10.11 86.73 8.51 12.67 78.81

The yield macro-action is performed in two different ways in TERESA architecture: • Doorways. A pre-defined behavior for these particular places has been programmed. In this case, the area around the doorways have been marked on the map previously. Thus, when the robot is inside a yield area and detects someone in that area coming through its passing direction, the Yield macro-action is triggered. Then, the robot will move to the closest yield point (those are also pre-defined, as depicted in Figure 19), and will wait for some time. Once the waiting time has expired, the presence of people standing on the yield area will be checked again. If none is there, the robot will resume its navigation. • Corridors and other narrow places. In this case, these areas are not marked on the map, so the yield action is not triggered explicitly. Instead, the combination of the social navigation costs related to people and the Wait macro-action will provoke a ’natural’ yield. In general, the robot avoids approaching someone from the front and tends to perform the approaching slightly from the side. In this situation, if the robot is blocked because there is no path available, the Wait macro-action will be triggered. As a result, the robot will wait (not in front of the person) until the path is free, what lets the person pass through.

D4.3: Navigation Demonstrator Page 32 of 62 TERESA - 611153

Figure 19: Pre-defined yield behavior in doorways. The robot (blue) finds someone on a yield area (pink) which will come through its path (green). Red arrows indicate the possible positions where the robot should wait until the person goes through the door and leaves the narrow area.

6 Approaching people

6.1 Introduction

When the TERESA robot needs to start a conversation with a person, the first step is ap- proaching him/her. The navigation stack offers the functionality of approaching the person au- tonomously. Whenever the visitor indicates through the interface that she/he wants to engage in a conversation with someone (by clicking on the screen), the behavior manager activates this macro-action. Approaching a person should respect certain social constraints, as, for instance, not approach- ing the person from the back. Instead of handcrafting the solution to the task, the idea is to learn how to perform the task from a human. For that, we have used the TERESA robot teleoperated by an expert, who was asked to demonstrate the task as accurately as possible repeatedly by piloting the robot. By using an imitation learning paradigm, we want to include the implicit social constraints into our sample-based planner when the macro-action is activated. To this end, we apply the methods described in Deliverable D4.2 [10]. Particularly, we explain in this section how we make use of the Gaussian Mixture Models (GMMs) described there to model the interaction between the robot and a person for the approaching task. After the mod- els are learned, they are integrated into a sample-based planner, an optimal Rapidly exploring Random Trees (RRT*), at two levels: as a cost function in order to plan trajectories considering behavior constraints; and as a configuration space sampling bias to discard samples with low cost according to the behaviors [52]. In general, Gaussian Mixture Models (GMMs) offer a flexible framework to model the relation- ships between the relevant features that arise when the robot is performing a particular navi- gation task. GMMs are a well-suited representation for unsupervised extraction of continuous feature distributions, and it has also shown its utility as a model for robot skills representations in Programming by Demonstration (PbD) settings [4]. Differently from [4], here we aim to use GMMs to incorporate that knowledge into our RRT ∗ planner. Our aim is to find a safe path which imitates a behaviour by remaining within statistically determined constraints. In [6], the authors present a PbD framework in which GMMs are used to retrieve the statistical constraints of several demonstrations of a particular task, in a manner similar to the approach in [5]. After that, a sampled-based planner based on RRT is used. In order to adapt the GMM

D4.3: Navigation Demonstrator Page 33 of 62 TERESA - 611153 model to the planner, that approach makes use of a temporal criterion that relies on mixture components sorted in time. Thus, the growing of the RRT tree is guaranteed with respect to the way in which the demonstrated trajectories were shown to the robot, avoiding connections attempts between separated mixture components. This prevents a single model from including several homotopy classes. This is a drawback, as some navigation tasks can be solved recurring to different homotopy classes. Two paths between an origin and a goal belongs to the same homotopy class if they can be continuously deformed one into the other within the free configuration space. Figure 20 shows the demonstrated trajectories by an expert controlling TERESA to approach an standing person in an arbitrary orientation. From the demonstrations it can be seen that when the standing person has his/her back turned to the robot, the demonstration trajectories do not follow the shortest path. Rather, the robot tends to take curved paths that bring its approach into the person’s field of view without coming very close to the person’s back. Furthermore, two homotopy classes can be seen there, corresponding to approaching to the person from the left or from the right. Our approach naturally solves this issue by integrating the RRT∗ planner, in which the optimization of the cost function in every step allows automatic selection of the homotopies included in the GMM model. Approaching a Person Scenario

Figure 20: Trajectories retrieved from the demonstration experiments. The initial position of the robot (blue square), the paths performed and the position and orientation of the person (yellow triangle) are showed. The goal state (green star) is also depicted.

6.2 GMMs for interaction modeling

T A proper choice of the relevant features f = [f1, f2, . . . , fn] when encoding a particular nav- igation task is crucial, as it provides part of the solution to the problem of defining what is important to imitate. In [51], we presented an analysis of the features used to model the "Ap- proaching Person" macro. In that work, different sets of features were proposed to find out the most relevant ones in order to effectively describe this task. For doing this, a supervised learning approach with an AdaBoost classifier was used. Based on the results and conclusions collected there, we have considered as features the distance and the relative angle between the robot and the person in the scene, logged with a timestamp. The features extracted at each time instant are the distance to that person (d) and the relative angle (θ). The angle values are continuous measures from [−π, π), negative clockwise (with respect to the person’s gaze). Thus, a set composed by N datapoints ζ = N {ζj}j=1 of D = 2 dimension is considered, where time is left out because the dynamics of the behaviors are not considered. From the demonstrated trajectories of the features, D, it is possible to extract a GMM, so that

D4.3: Navigation Demonstrator Page 34 of 62 TERESA - 611153 the probability of a particular combination of feature values is given by:

K X i i p(f|D) = ωiN (f; µD, ΣD) (24) i=1

 i i K with K Gaussian modes. The GMM is then described by the set of parameters ωi, µD, ΣD i=1, respectively representing the prior probabilities, centers and covariance matrices of the model. PK The prior probabilities, ωi, satisfy ωi ∈ [0, 1] and i=1 ωi = 1. Further details about the parameters learning and the model selection are described in Deliv- erable D4.2 [10], where this approach was first introduced within the context of the TERESA Project.

6.3 The reproduction planner

Once we have the GMM that encodes the relation between the relevant features in the ap- proaching task, here we describe how the model can be used into our motion planner, the RRT* described in Section 4.2. We extend the standard RRT∗ algorithm with the learned GMM at two levels: 1. Including a new task-similarity cost into the evaluation of the node’s costs. The GMM obtained encompasses the most likely configurations of the task. Thus, when a node is proposed to be added to the RRT∗ tree, a cost based on the GMM is derived and used. The objective is to increase the cost of those configurations that are unlikely according to the learnt GMM. Thus, the likelihood is inverted to obtain that cost. To commensurate this cost term with the rest of cost functions, the likelihood is truncated to a certain low value δ. 2. Providing the planner with the most likely subspace to perform the sampling of the RRT∗. If the RRT∗ knows which are the most likely paths, then we can bias the sampling of the configuration space to these areas, reducing the probability of sampling useless states and, hence, reducing the computational costs to obtain a solution. To this regard, it is possible to define by a parameter the percentage of uniform and GMM-based sampling to be used. In this way, the approach presented here can still take advantage of the random sampling. These changes adopted still guarantee the probabilistic completeness of the modified RRT∗ planner presented in this work. Although the sampling method is now based on a GMM and is not a uniform strategy, if the execution time tends to infinity then even states which are far away from the most probable area of sampling have a small probability of being selected. Besides this, our algorithm allows the use of a uniform sampling joint with the GMM-sampling. As commented before, this is selectable by a parameter that defines the percentage of times that each sampler will be used. Then, the use of this mixed sampling strategy allows us to reach the goal state quicker than using only GMM-sampling when we can not afford a planning similar to the demonstrated way.

6.4 Experimental results

The following sections are going to describe the manner in which data were collected; the kind of metrics that were used to get an idea about the performance of the framework; and a brief

D4.3: Navigation Demonstrator Page 35 of 62 TERESA - 611153

Figure 21: A view of the general set up performed to collect the data. comparison with an existing framework [6]. We also two important aspects about the suitability of both frameworks for planning: the management of the homotopies when solving the task and the generalization to deal with situations far away from the learned examples.

6.4.1 Gathering the data for learning

Several experiments were performed to retrieve exemplary trajectories to feed the GMM learn- ing phase. Those experiments took place inside a clear room, free of obstacles between the person and the robot while performing the scene described in Section 6.1. A set of 9 trials for each homotopy in each task were logged. The study was recorded using a motion capture system (OptiTrack) and the robot and person’s poses were extracted automatically from the data (see Fig. 21). At points where the automatic tracking failed, the data were filled up by specific interpolation tools in a post processing step. The gathered data were used to derive the GMM models of the features using the method described in Section 6.2. Those models were then integrated in the RRT* as explained in Section 6.3. This method is called GMM-RRT*. A set of GMM models in x − y space were also derived in order to test the GMM-RRT planner [6], used as a baseline. The parameters used were K = 19 GMM modes (for each task and method) and a value of δ = 0.001.

6.4.2 Metrics

We propose two metrics to compare the obtained paths from the different planners and with respect to the demonstrated trajectories, as used in [32]: The first metric is called Trajectory Difference Metric (TDM), which is defined as follows:

|ζD| |ζD| |ζP | 1 X 1 1 X X TDM(ζ , ζ ) = min ζ (i)ζ = min ζ (i)ζ (j) (25) D P |ζ | D P |ζ | |ζ | D P D i=1 D P i=1 j=1 where ζP (j) and ζD(i) are the points of the two trajectories ζP and ζD to be compared, and ζD(i)ζP (j) is the distance between two points. |ζD| and |ζP | are the number of samples of each trajectory. This metric gives an idea of the similarity of two given trajectories. The final metric is given by the averaged value of this metric for all the planned and demonstrated trajectories:

D4.3: Navigation Demonstrator Page 36 of 62 TERESA - 611153

Figure 22: Mean cost value and the standard deviation obtained for an GMM-RRT* with (red and green, subindex bc) and without (blue, subindex c) GMM sampling bias.

1 1 X TDM = TDM(ζ , ζ ) (26) |D| |P | D P D,P where D and P are the number of demonstrated and planned trajectories, respectively.

The second metric is the resulting averaged trajectory length ratio le, expressed as the mean of the ratio of the absolute value of the difference between the planned trajectories’ and the demonstrated trajectories’ lengths divided by the demonstrated trajectories’ length:

1 1 X |l(ζD) − l(ζP )| le = (27) |D| |P | l(ζD) D,P

6.4.3 Results

Evaluation

This section is oriented to evaluate the convergence speed of the RRT*-based planner to the optimal path in term of costs. Figure 22 shows the evolution of the solution path cost versus the number of iterations using 100% and 0% GMM bias, for the “Approaching a Person” task. The allowed planning time for this comparison was 100 seconds in order to converge to the optimal cost. It can be seen in this example how the planner that includes the GMM sampling is faster.

Metrics performance

For the following results a mixed-sampling strategy was adopted: 95% of the time a GMM sampling was employed, while the remaining 5% the sampling was uniform. We aim to take advantage of the learned models and also allow a degree of randomness when sampling configurations. This feature is only applicable to the GMM-RRT* planner. The construction and the use of the GMM model for the GMM-RRT planner must satisfy certain restrictions that make sampling possible only on consecutive nodes. For further details please consult [6]. The

D4.3: Navigation Demonstrator Page 37 of 62 TERESA - 611153

Figure 23: Demonstrated (green) and planned trajectories (red for GMM-RRT* and blue for GMM-RRT) are depicted. The “Start” and “Goal” states are also shown, while the “Person’s” gaze is represented by a triangle. planner is given enough planning time to converge. Figure 23 shows a complete set of 25 trials for each homotopy and planner, with the parameters explained before. Regarding on how well both planners imitate the demonstrated trajectories, Table 6 shows the values obtained when the different planners are used in the approaching task based on the metrics related in Section 6.4.2. It can be seen in that table that the planner proposed in this work outperforms in nearly every homotopy considered.

Approaching a person Planner TDM (m) le(%) Right homotopy GMM-RRT* 0.0565 ± 0.0152 3.66 ± 2.64 GMM-RRT 0.0803 ± 0.0205 4.00 ± 2.51 Left homotopy GMM-RRT* 0.0484 ± 0.0109 4.96 ± 2.37 GMM-RRT 0.0676 ± 0.0169 4.31 ± 3.1 Frontal homotopy GMM-RRT* 0.0376 ± 0.0141 11.35 ± 5.35 GMM-RRT 0.0337 ± 0.0184 8.55 ± 3.34

Table 6: Trajectory quality approaching a person. Smaller values are better for all metrics. The best values are highlighted in boldface.

Managing the homotopies

As commented in Section 6.1, the task being learned by the robot may be composed by several homotopies. Figure 24 illustrates the models of the Approaching a Person task.

Figure 24: First-left: The model includes the three demonstrated homotopies. Second-left to right: one model per each demonstrated homotopy.

D4.3: Navigation Demonstrator Page 38 of 62 TERESA - 611153

This is a clear disadvantage of the GMM-RRT planner. Not only the social situation has to be known beforehand to choose between the homotopy models, but also once a model has been selected, it can occur that an obstacle hampers the execution of the plan. In such a situation, both planners can take advantage of the variability in the execution of the demonstrated task, encoded by the GMMs covariance matrices, to avoid the obstacle, but sometimes this could not be enough.

Generalization

Regarding the generalization of the proposed approach to other scenarios, the approaching a person task is modified so an obstacle (as a rectangular shadowed area in Fig. 25) is introduced in the robot path to its goal. The GMM-RRT* planner can still reach the goal due to the mixed-sampling strategy adopted: the uniform bias sampling allows for random exploration of the state space outside the learned demonstrations.

Figure 25: Left: simulated trial without uniform sampling. Right: 95% GMM and 5% uniform sampling.

The simulations shown in Fig. 25 were throughout a 100-second time horizon. A 100% GMM sampling was not able to find a path that could handle the obstacle included. If a mixed- sampling strategy is allowed, with the same time horizon, it can be seen how the goal is reached successfully. The percentage of bias, thus, offers a trade-off between the imitation capabilities and the planning time required to obtain a path. An adaptive solution that modifies this bias after obtaining a first good path is left for future work.

D4.3: Navigation Demonstrator Page 39 of 62 TERESA - 611153

7 Walking side-by-side

7.1 Introduction

In this task, the robot starts engaged in conversation with a person, who we denote as the target. Then, that person decides to move somewhere else, so that the robot has to follow her/him while conversing. The idea is that both, robot and person, walk together towards the person’s goal while they maintain a conversation. The robot should navigate avoiding obstacles and following the target in a social way, otherwise it would not be perceived as natural. For that, it is useful to take into account the positions and velocities of other people around, the target and its goal, and additional obstacles, so that the motion of the robot can be adapted to the situation. Furthermore, this is a joint task between the person and the robot and, thus, the actions of both affect each other. For instance, the person will wait for the robot (or viceversa). Therefore, for this particular task we have developed a new low-level controller that tries to consider the social effect of its actions on the target and other persons, as well as the effect of others’ actions on it. In order to model such effects, we have resorted to Social Force Models (SFM). They model the local interaction between social agents, and have been used extensively for modeling motion of crowds [46, 47] and also as models for social navigation. These models require to know the poses of the different persons, as well as their destinations. The perception system of the robot provides most of that information, but there are many un- certainties associated, mainly with respect to people’s intentions, but also with the perception itself. It is important to deal with such uncertainties. Partially Observable Markov Decision Pro- cesses (POMDPs) provide an interesting mathematical framework to plan under uncertainties. They have been considered for different navigation tasks [2, 27]. However, they face computa- tional and scalability issues with complex models. Indeed, many POMDP-based approaches only deal with discrete representations of the environment or compute plans offline, which is not ideal in highly dynamic scenarios as ours. We propose a hierarchical approach to solve this walking side-by-side task. We combine a POMDP-based planner which deals with uncertainties related to the target’s positions and in- tentions; and a lower-level controller which receives commands from the POMDP and makes the robot execute the plan avoiding obstacles and following the target in a social manner. Thus, we alleviate the complexity of the POMDP model, which focuses on uncertainties asso- ciated with the target. The POMDP algorithm uses an online solver (POMCP) to provide new plans as the robot executes the task and considers a continuous representation of the envi- ronment. Instead of enumerating all possible states and probability transitions, we alleviate that issue (also known as curse of dimensionality) by using a generator function that simulates the POMDP model. That simulator allows us to use continuous and discrete representations seamlessly without the need to define the complete model explicitly. In particular, it models social movements for the target and the robot with the mentioned SFM. Moreover, POMCP is an anytime algorithm, i.e., we can stop the planning at any time and recover the best policy so far. The low-level controller receives high-level commands from the POMDP (i.e., whether it should accompany the person at the right or left, etc.) and navigates the robot socially with an obstacle avoidance algorithm based on the SFM. This controller also takes into account other people around, not only the target, to navigate the robot socially. In our task, we want to consider social navigation in dynamic and complex environments, with many humans walking around; uncertainties in robot sensors and actuators; and uncertainties in people’s intentions by predicting human motion. We propose our hierarchical approach based on POMDP because we want to consider uncertainties and deal with multiple people

D4.3: Navigation Demonstrator Page 40 of 62 TERESA - 611153 walking around. We split that problem by reasoning about the uncertainties on the target within the POMDP, and considering additional people within the SFM-based controller. In the literature, there have also been other works using POMDPs and social models to solve the task of walking together with a person to a common destination. In [24], the authors use the SFM for social navigation in a walking side-by-side task, and they include the prediction of peo- ple’s intentions. They select over a set of predefined destinations in a geometrical fashion, but they do not plan reasoning over uncertainties and do not consider kinodynamic constraints. In [23], they use the same geometrical method to predict people intentions, and then, they com- bine it with a SFM to estimate people’s movements and navigate the robot reducing its impact on people. They propose a RRT-based planner taking into account kinodynamic constraints. The POMCP solver is also applied to a similar task in [27]. The main problem reported in [27] when using the POMCP is that the robot moves too slowly in real-time, not being able to follow the person. This is due to the discretization of the actions (north, west, and so on), so they propose to command further goal positions to reach them with the navigation systems without unnecessary turns. Contrary to them, we model social movements for humans with the SFM, and we reason on target intentions. Moreover, they do not use the actions planned by the POMCP. When the person is visible, they use a heuristic method to follow her/him; when the person is not visible, a probabilistic estimation of its position is used to select the likeliest point as robot’s next goal.

7.2 Preliminaries

7.2.1 Social force model

The Social Force Model, initially presented in [30] and widely extended in different works [46, 47, 23], aims to model the motion of pedestrians when in presence of other pedestri- ans, obstacles, etc. The SFM models the motion of pedestrians as particles acting under the influence of a set of social forces in the environment. These forces can be attractive or repul- sive. The first type represents, for instance, the pedestrians’ intentions, trying to reach their goals or walking close to friends in a group. The repulsive forces can be used to model the comfort distance between strangers, as well as the security distance to obstacles during the execution of trajectories. We use the Social Force Model presented in [46, 47], which can be summarized as follows: the instant velocity vα(ti) and position rα(ti) (2D) of a pedestrian α trying to reach a desired goal goal rα as comfortably (socially) as possible are given by:

vα(ti) = vα(ti−1) + Fα(ti)(ti − ti−1)

rα(ti) = rα(ti−1) + vα(ti)(ti − ti−1) where Fα(ti) is the instant force for pedestrian α at time instant ti. Considering a unitary mass for the pedestrian, the force is equal to the acceleration. It is important to mention that the norm max of the velocity for the pedestrian is saturated to a maximum velocity value vα that represents the desired velocity of pedestrian α to reach the destination.

The force Fα(ti) is obtained as follows:

goal social obstacle group Fα(ti) = K1Fα (ti) + K2Fα (ti) + K3Fα (ti) + K4Fα (ti)

D4.3: Navigation Demonstrator Page 41 of 62 TERESA - 611153

goal Figure 26: Some of the forces of the Social Force Model: red, goal force Fα ; green, group group force Fα ; yellow, total force Fα. In this case, the obstacles and social forces from other persons cannot be seen as they are small. The arcs represent the (v, ω) actions considered by the controller of Section 7.4.

where K1, K2, K3 and K4 are the gains for the corresponding forces. These forces represent different effects on the motion of a person: goal goal • Fα (ti) is the attractive force towards rα .

1 Fgoal(t ) = vmax(egoal(t ) − v (t )) α i τ α bα i−1 α i−1

goal goal where beα (ti−1) is the unitary vector from rα(ti−1) to rα and τ is a parameter known as relaxation time. social • Fα (ti) is the repulsion force from other pedestrians β given by the sum of forces Σβfαβ. A pedestrian normally feels uncomfortable when he/she is too close to a strange person. The repulsive force from each other pedestrian β can be represented as:

d/B (n0Bθ)2 (nBθ)2 fαβ = −Ae [e t + e nb]

where: – A, B, n0 and n are parameters. – d is the distance between pedestrians α and β.

– t is known as the interaction direction given by dαβ/|dαβ|, with dαβ = λ(vα − vβ) + beαβ, where λ is a parameter and beαβ is the unitary vector from rα to rβ – nb is an orthonormal vector to t – θ is the angle between t and beαβ obstacle • Fα (ti) is the repulsion force from obstacles. Pedestrians usually walk keeping a security distance from each obstacle in the environment. This force is obtained by the sum of repulsion forces from static obstacles in the environment such as walls and chairs. The repulsion force from an obstacle ψ is given by:

−d/b fαψ = ae beαψ

where a and b are parameters, d is the distance to the obstacle and beαψ is the unitary vector from rα to rψ.

D4.3: Navigation Demonstrator Page 42 of 62 TERESA - 611153

group • Finally, Fα (ti) is the attractive force related to keeping the cohesion of the group of people with whom the pedestrian walks. If the pedestrian is walking alone, this force is not applied. Otherwise, it is given by the sum of three forces related to the group:

group g att g rep g vis Fα (ti) = K1 fα (ti) + K2 fα (ti) + K3 fα (ti)

g g g where K1 , K2 and K3 are the weights for the corresponding forces, and: att – fα is the attraction force to the center of mass of the group and is given by:

att fα = qAbeα

where qA is a parameter and beα is the unitary vector from rα to the center of mass of the group. rep – fα is the repulsion force force of the pedestrian to each other pedestrian γ in the group since all pedestrians keep their own vital space in the group. It is given by:

rep fα = qRΣγbeαγ

where qR is a parameter and beαγ is the unitary vector from rα to rγ vis – fα is the visibility force, each pedestrian tries to maintain a good position of visibility of the rest of members in the group while trying to reduce his/her head rotation φ. The greater φ, the less comfortable is the turning for walking. This is modeled as follows:

vis fα = −qV φvα

where qV is a parameter, φ is the head rotation from −π/2 to π/2 and vα is the velocity of the pedestrian. Figure 26 depicts a graphical representation of the forces. The previous SFM have been implemented into a library14 that can be used by the navigation stack.

7.2.2 Partially observable Monte-Carlo planning

Formally, a discrete POMDP can be defined by the tuple hS, A, Z, T, O, Ri [33]. • The state space is the finite set of possible states s ∈ S, for instance robot and people poses. • The action space is defined as the finite set of possible actions that the robot can take, a ∈ A. • The observation space consists of the finite set of possible observations z ∈ Z from the onboard sensors. • After performing an action a, the state transition is modeled by the conditional probability function T (s0, a, s) = p(s0|a, s), which indicates the probability of reaching state s0 if action a is performed at state s.

14https://github.com/robotics-upo/lightsfm

D4.3: Navigation Demonstrator Page 43 of 62 TERESA - 611153

• The observations are modeled by the conditional probability function O(z, a, s0) = p(z|a, s0), which gives the probability of getting observation z given that the state is s0 and action a is performed. • The reward obtained for performing action a at state s is R(s, a). The state is not fully observable; at every time instant the agent has only access to observa- tions z which give incomplete information about the state. Thus, a belief function b is maintained by using the Bayes rule. If action a is applied at belief b and observation z is obtained, a new belief b0 is given by:

X b0(s0) = ηO(z, a, s0) T (s0, a, s)b(s), (28) s∈S where the normalization constant:

X X η = p(z|b, a) = O(z, a, s0) T (s0, a, s)b(s) (29) s0∈S s∈S gives the probability of obtaining a certain observation z after executing action a for a belief b. The objective of a POMDP is to find a policy that maps beliefs into actions in the form π(b) → a, so that the value is maximized. This value function represents the expected total re- ward earned by following π during h time steps starting at the current belief b: V π(b) = hPh t i P E t=0 γ r(bt, π(bt))|b0 = b , where r(bt, π(bt)) = s∈S R(s, π(bt))bt(s). Rewards are weighted by a discount factor γ ∈ [0, 1) to ensure that the sum is finite when h → ∞. Therefore, the op- timal policy π∗ is the one that maximizes that value function: π∗(b) = arg maxV π(b). π In general, POMDPs face scalability issues due to their computational complexity. This is mainly due to the curse of history and the curse of dimensionality. The former comes from the fact that POMDPs plan over all future actions and observations in a receding time horizon; the latter from the fact that the complexity of the model and the belief space grows with the num- ber of states, actions and observations. POMDP solvers attempt to compute optimal policies tackling the complexities mentioned, and most of them are based either on the Value Iteration or Policy Iteration algorithms, iterating over the value function or the policy, respectively. We are interested for our work in online POMDP solvers [54]. Those compute policies for the POMDP in an online fashion from the current belief, instead of precomputing the policy offline before the execution of the task in hands. Most online solvers use forward search over the current belief state to approximate the value function (Value Iteration). The idea is to build a search tree of belief states expanding from the current one with all possible actions and observations, and using some heuristic to determine which leafs should be expanded first. Besides, upper and lower bounds on the value function are propagated along the tree during search, and the best policy so far maintained. As said before, increasing the number of actions and observations makes the tree grow for a given depth. Also, the solver usually requires an explicit model of the POMDP distributions enumerating the states, which gets more complex as the state space grows. Factored models and Monte-Carlo simulations can be used to alleviate this latter problem. In Monte-Carlo planning, instead of using an explicit model with all probability transitions for the POMDP, a simulator G is used as a black box to generate future states and observations. Given a state and action, the simulator provides a sample of a successor state, observation and reward (s0, z, r) ∼ G(s, a). Then, the simulator is used to compute sequence of states,

D4.3: Navigation Demonstrator Page 44 of 62 TERESA - 611153 observations and rewards, approximating the value function. The simulator deals with the complexity of the model by taking samples and simulating the model, instead of describing it explicitly. There is no need to enumerate all transitions between states, but just generating new states for the sampled ones. Partially Observable Monte-Carlo Planning (POMCP) is a particular online algorithm for solving large POMDPs. We do not reproduce here the whole algorithm, but summarize the main features and refer the reader to [56] for more details. POMCP generates a tree where each node represents a belief (or history). Each node can be expanded by selecting an action and an observation into new nodes, and the root node is the current belief state. Each node also stores its average expected reward and the number of times that simulations have passed through it. The algorithm runs many simulations, starting at states sampled from the current belief. For every node (history) encountered during the simulation, its counter is increased and its belief is updated according to the simulated state. Thus, the search procedure expands the tree, selecting in the end the action with the greatest expected value. At that moment, the action is performed, a real observation is measured, and the reached belief becomes the new root node of the tree for the following search. The remainder of the tree is pruned. In order to expand the leaf nodes, a rollout policy is used to generate the actions. That policy could be random or heuristic. Another advantage of POMCP is that the state factors could be continuous instead of discrete. As long as the actions and observations are discrete, the algorithm can be run by generating simulations and counting histories. If we want to consider continuous states, the only require- ment is to have a simulator G that is able to deal with continuous states and sample them.

7.3 Hierarchical planner for walking side-by-side with a person

We propose a two-level planning architecture for this social task of walking side-by-side with a person while talking. On the top level, a POMDP-based planner decides on better areas to place the robot while accompanying the person. This module is able to plan under uncertainties and with a receding horizon, predicting the person’s poses and intentions in the near future with the help of the Social Force Model in Section 7.2.1. The exact POMDP model is detailed below and is solved with an online algorithm based on the POMCP solver [56] that has been implemented into a library15. This top-level POMDP decides the best areas to place the robot, which are translated into relative waypoints around the accompanied person and sent to a lower-level controller. This low-level controller consists of a social navigation algorithm based on the Social Force Model explained before. The controller is able to navigate the robot in a social manner avoiding obstacles and taking into account all positions from people around to reproduce socially acceptable trajectories. Moreover, the controller attempts to achieve sub- goals specified by the top-level POMDP planner, which will command whether the robot has to accompany the person from behind, at her/his left/right, etc. In this architecture, the high-level module is in charge of planning taking into account uncertain- ties and future outcomes, mainly regarding the accompanied person’s intentions and positions. The low-level module is in charge of following social local trajectories with obstacle avoidance, taking account all people around (not only the accompanied person) and preferences from the higher-level planner, which will indicate sub-goals for the controller around the target person. In the following both modules are detailed.

15https://github.com/robotics-upo/lightpomcp

D4.3: Navigation Demonstrator Page 45 of 62 TERESA - 611153

Algorithm 2: SFM-based controller for robot navigation

Require: Map with static obstacles M; set with predefined pairs of linear and angular velocities V; maximum linear and angular accelerations (amax, ω˙ max). 1: repeat 2: Get operational mode: k ∈ {left, right, behind} 3: Get pose and velocity of robot: rr, vr 4: Get pose and velocity of target: rt, vt 5: Get set of poses and velocities of surrounding people: P goal 6: r ← computeGoal(k, rt, vt) goal 7: vd ← SFM(rr, vr, r , rt, vt,M, P) 8: for each velocity pair (v, ω) ∈ V do 9: Check if the pair is reachable given vr and (amax, ω˙ max) 10: Check if the pair can provoke collision 11: Compute the fitness value α(v,ω) 12: end for 13: Select valid (v, ω) with the best fitness value 14: until end of controlling

7.4 Low-level controller based on SFM

The controller based on SFM is in charge of selecting pairs of linear and angular velocities in real-time (∼15Hz) so that the robot keeps a social behavior while it is walking together with the target. For each time step, the SFM explained in Section 7.2.1 generates a desired velocity vector vd for the robot considering obstacles, surrounding people and keeping both target and robot in the same group. Using a SFM for walking side-by-side missions was introduced by Ferrer et al. [24] and we applied here that idea to develop the low-level control, trying to navigate the robot as a person. The local goal rgoal of the robot for each instant of time is located following an operational mode that can be left, right or behind of the target; this local goal will be considered as the desired goal for the robot in the SFM. The high-level planner that will be explained below is in charge of commanding operational modes at a lower frequency and the low-level controller follows the selected mode recalculating the goal at left, right or behind the target as the robot navigates. The distance from the local goal to the target is a parameter of the algorithm with a default value of 1 meter. Note that the low-lever controller also needs the target’s position to compute the position of the goal given the operational mode. This is estimated by the high-level planner as part of the belief and sent to the low-level controller together with the operational mode. Hence, we reduce the effects that errors in the observations may have in the controller’s performance. The movement of the robot is assumed to be social when it follows the desired velocity vectors coming from the SFM, since the different social forces are taken into account. The controller needs to select pairs of linear and angular velocities for the robot in order to track the desired SFM velocity vector, but respecting kinematics constraints and security distances to obstacles and people. For that, we developed the Algorithm 2, where we applied an approach similar to the one in the Dynamic Window Approach Algorithm [26]. At each iteration, Algorithm 2 observes the positions and velocities of the robot, the target and the other people around. The goal of the robot is computed depending on the operational mode, to the left, right or behind the target. Then, the SFM provides a desired velocity for the robot vd. The robot selects its linear and angular velocities from a predefined set of discrete

D4.3: Navigation Demonstrator Page 46 of 62 TERESA - 611153 values V, which is obtained taking into account the range of possible velocities. Moreover, pairs not fulfilling with acceleration constraints or that may provoke collisions are discarded. Collision checking is performed with the static map M and the people around by simulating velocities with a simple kinematic model of the robot. The admissible pairs of velocities are evaluated with a fitness function, and the one with the highest value is selected and commanded to the robot. The fitness is computed as follows:

α(v,ω) = βv(|v − vref |/vmax) + βy(|θ − θref |/π) + βd(1.0 − d/D) where:

• βv, βy and βd are parameters

• vd is the module of the desired velocity vd

• vmax is the maximum velocity of the robot • θ is the orientation of the robot after applying the (v, ω) for a fixed time dt

• θd is the desired orientation of the robot given by the desired velocity vd • d is the distance to the closest obstacle in the trajectory obtained applying (v, ω) • D is the maximum detection range for obstacles, given by the robot sensors The idea behind the fitness function is to simulate the trajectory of the robot when following (v, ω) (a simple kinematic model is used), and evaluating how close to the desired velocity the robot ends up. For that, the fitness function measures the differences in module and angle of the final robot velocity with respect to the desired one. Besides, trajectories with obstacles farther are scored better. Note that all terms in the fitness function are normalized. Even though this SFM controller is supposed to receive the operational mode from a higher- level planner, it could also operate without the planner; just determining the operational mode with some heuristic. For instance, an option is to select the position at left, right or behind the target depending on which is closer to the robot at each moment. Moreover, if the target is not being detected or if he/she is farther than a threshold, then the heuristic could select an extra operational mode to use as local goals for the robot the last positions where the target was observed, following his/her trail.

7.5 High-level planner based on POMCP

In this section, we describe the POMDP-based planner used at the higher level of our navigator for walking side-by-side with a person. The planner consists of a POMCP solver that searches online in order to find an optimal policy for the model defined. First, we introduce our POMDP model, which has continuous states and discrete actions and observations.

7.5.1 States

x y The state is defined as the tuple s = (r, p), where r = (xr, yr, vr , vr ) represents 2D position and x y linear velocity of the robot; and p = (xp, yp, vp , vp , gp) represents the 2D position, linear velocity and goal of the person being accompanied. The state is composed of continuous variables except for the goal gp. In that case, we define a discrete set of goal positions G = {G1,...,Gm},

D4.3: Navigation Demonstrator Page 47 of 62 TERESA - 611153 which represent spots typically visited in the environment, such as vending machines, doors, and so on. We assume that the person always intends to go to any of those goals.

7.5.2 Actions

As possible actions, we define a discrete set A = {left, right, behind, goal(g)}. The first three actions are related to sectors around the target person where the robot could be placed while accompanying. Thus, the robot can try to place itself on the commanded sector. The fourth one is a class of actions including a goal g ∈ G as parameter, therefore the size of the action set is 3 + |G|. When goal(g) is commanded, the robot is commanded to go to one of the possible goals of the person. For that, it computes a path to g by means of an A∗ algorithm and takes the closest point of the path as a short-term goal (see transition functions below). This local goal will be used as attractor to calculate desired social forces (see Section 7.2.1). The planner sends the operational mode (left, right, behind or goal(g)) to the low-level con- troller, which is in charge of navigating the robot socially and following the commanded behav- ior. The controller keeps the last operational mode until a new one is received. The goal(g) action is included to consider cases where the robot is quite certain about the target’s final goal, in which case it should be used as a joint goal for both.

7.5.3 Observations

The position of the robot r is assumed to be observable, since the uncertainties associated with its localization system are negligible compared to those in target’s poses. Thus, the observation z consists of the observed position for the target person. If the person cannot be detected z = hidden; otherwise z ∈ P , where P is a discrete set of possible locations in the scenario. In order to obtain P , the environment is discretized into a grid with a resolution cellsize. The goal of the person cannot be observed directly.

7.5.4 Reward function

There are several objectives for the POMDP: (i) the robot should stay as far as possible from obstacles; (ii) close to the accompanied person; and (iii) in the best relative orientation to optimize the comfort of the target walking aside. Thus, a multi-objective reward function is group proposed R(s, a) = kodo − kpdp + kα|Fr |; where do is the distance of the robot to the group closest obstacle, dp the distance of the robot to the accompanied person, and |Fr | is the module of the group force vector for the target as described in Section 7.2.1. We use that force to measure the level of comfort of the target with respect to the robot. ko, kp and kα are parameters to weight the different objectives. Moreover, a static map of the environment is used to compute the distance to obstacles in the term do. Note that those do not need to be maintained explicitly in the state, since the map is assumed to be observable and fixed along time.

7.5.5 Transition and observation probability functions

In this POMDP, an action is supposed to be executed each ∆t seconds. The transition and observation functions are used to predict future states and observations, and also to update

D4.3: Navigation Demonstrator Page 48 of 62 TERESA - 611153 the belief according to the Bayes rule. In the POMCP algorithm, the transition and obser- vation models are encoded within a black box simulator that can sample a successor state, observation and reward from an initial state and action: (s0, z) ∼ G(s, a). The SFM explained in Section 7.2.1 is used to predict the people movements as well as the robot movement. We assume that the robot will behave socially as another human, since a social SFM-based controller is used for local navigation. We define a primitive implementing x y 0 the SFM, predict(r, p, gr, i, ∆t) → (xi, yi, vi , vi ) ; which receives the initial pose and goal of the person, the pose and goal of the robot, the person/robot to predict i ∈ {p, r}, and the time duration. This primitive returns a predicted position for the given person/robot i after applying the SFM during ∆t, taking into account also other people detected in the environment. The goal of the robot gr will be obtained depending on the action applied. Moreover, in this SFM the robot and the accompanied person are treated as a group, so special forces are modeled between them so that they stick together. Therefore, the SFM is used to predict person’s and 0 robot’s positions and velocities. The goal of the person is assumed not to vary, gp = gp during the task.

The goal of the robot gr depends on the action and it is set with respect to the person’s po- sition. In particular, gr is placed to the left/right of the person if the action is left/right, and behind the person if the action is behind (at a social distance of the target not to make her/him uncomfortable). In addition, the goal that we have for the target is a long-term goal, i.e., where the target is traveling to during the task. However, the SFM needs to know short-term goals for the agents in order to predict their positions locally. Otherwise, agents may get trapped into local minima. Therefore, given the goal gp and the person pose (xp, yp), we compute a path of the target to its goal by means of an A∗ algorithm, and take the closest waypoint in the path as its local goal. The observation z measures the position of the target. There is a probability of not detecting it and observing hidden, which depends on the distance between the robot and the person dp. Thus, p(z = hidden) = pfn if dp < rs; and p(z = hidden) = 1 − pfp otherwise. rs is a parameter indicating the range of the robot sensor for people detection, while pfn and pfp are the probabilities of false negative and false positive detections, respectively. Also, a Gaussian distribution N (xp, yp;Σp) is evaluated at the centroid of each cell in P , obtaining a probability of being detected within that cell. All those Gaussian probabilities are normalized accordingly in order to obtain the complete observation function.

7.5.6 Other issues

The POMCP algorithm requires a manner to sample initial states s ∼ I when starting the planning. The positions and velocities of the person and the robot can be obtained from the last available observations. The planner waits until the accompanied person is not hidden. The person’s goal is initialized assuming a uniform distribution over G when no additional information is provided. Our POMCP algorithm does not expand all possible branches associated with actions in the tree. We save some computation by checking first which actions are valid and discarding oth- ers. The function check_action :(s, a) → {yes, no} determines whether an action is doable at a certain state. Actions that would drive the robot too close to an obstacle or the accompanied person are labeled as no valid. After executing the action selected by the POMCP, an actual observation z is measured to update the belief. If this observation is not included in the POMCP tree, it is reset.

D4.3: Navigation Demonstrator Page 49 of 62 TERESA - 611153

Figure 27: Walking side-by-side simulated environment

7.6 Experiments

In this subsection we show some experimental results to test our method for walking side-by- side with the visitor during a conversation. In order to highlight the advantages of the hierar- chical approach based on POMCP, we compare our method with a heuristic and alternative version which does not plan reasoning over uncertainties nor future actions. In our heuristic controller, the robot still uses the SFM low-level controller detailed in Section 7.4 in order to perform the macro-action walking side-by-side. However, instead of receiving operational modes from the high-level planner, it uses some heuristics to determine the best mode at each moment. In particular, the robot places its goal to the left, right or behind the per- son depending on which of them is closer. Thus, it will tend to follow the target from the same direction. When the target is farther from a lookahead parameter (2m in these experiments) or is not detected, the robot uses the trail with the last observed target’s positions in order to place its goals following the path the target did. If the robot catches up with the target and it comes back within its lookahead, operational modes are selected again, as explained before. Otherwise, it the robot reaches the end of the target’s trail not detecting it again, it waits there until new observations are obtained. We run experiments with our hierarchical planner in Section 7.2.2 and with the heuristic version explained above in a simulated environment. This simulated scenario represents the intersec- tion of three corridors where the robot and a person should walk side-by-side (see Figure 27). They both start always at the same position (point A at the left of the scenario) and have to walk while conversing to a common goal selected randomly between two options (points B or C at the ends of the two corridors). The movement of the target toward its goal is simulated with the SFM selecting as short goals points of a path precomputed with an A∗ algorithm. Of course, the target’s goal is unknown for the robot. In our real experiments with the TERESA robot (see Section 8), we experienced how noisy observations from the people walking around the robot affected negatively the performance of the system. Therefore, in order to simulate these issues in a more realistic manner, we included in our simulator noise for the observed poses of the target. In particular, we added a Gaussian error N(0, 0.25) (in meters) to the observations of the target. Moreover, we also simulated false negatives with a predefined probability pf . If a false negative occurs, then the target is not detected for two seconds. The parameters for the Social Force Model and for our hierarchical algorithm were hand-crafted after testing different situations and are summarized in Table 7. We run the same set of simulations with our POMDP-based approach and with the heuristic version, comparing the results. All simulations started with the person and the robot together at

D4.3: Navigation Demonstrator Page 50 of 62 TERESA - 611153

Table 7: Parameters for the walking side-by-side algorithms

Parameter Value K1 2.0 K2 2.1 K3 10.0 K4 1.0 τ 0.5 s A 1.0 B 0.35 SFM n0 3.0 n 2.0 λ 2.0 a 1.0 b 0.2 g K1 2.0 g K2 1.0 g K3 3.0 qA, qR, qV 1.0 βv 0.4 βy 0.3 β 0.3 Low-level controller d vmax 0.6 m/s 2 2 amax, ω˙ max 1.0 m /s,1.0 rad /s Heuristics Lookahead radius 2.0 m pfn 0.1 pfp 0.0 Planning time 0.2 s POMCP Time between actions 0.5 s cellsize 1.0 m Σp 0.25 m ko,kp,kα 1,1,-100

7 450

400 6

350 5

300 4 250 3

Total time (s.) 200 2

150 Average waiting time (s.) 1 100

50 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Tracker Error Prob. Tracker Error Prob.

Figure 28: Simulated results for the walking side-by-side behavior. In the x axis, we represent the probability of false negatives pf ; in the y axis, different metrics (mean and standard error) for our POMCP-based planner (red squares) and the heuristic approach (blue squares). Left, total time; right, waiting time.

D4.3: Navigation Demonstrator Page 51 of 62 TERESA - 611153

5.4

5.2

5

4.8

4.6

4.4

Average social group force (N.) 4.2

4 0 0.2 0.4 0.6 0.8 1 Tracker Error Prob.

Figure 29: Simulated results for the walking side-by-side behavior. In the x axis, we represent the probability of false negatives pf ; in the y axis, the social group force (mean and standard error) for our POMCP-based planner (red squares) and the heuristic approach (blue squares). point A, and for each of them, the person selected randomly a goal between B and C, moving towards it according to our SFM. Moreover, we increased gradually the probability of false negatives pf and run 10 simulations for each value, showcasing how the performance evolves with a noisier environment. The following metrics were measured during the experiments to evaluate both approaches: • Total time: Time to finish the macro-action walking side-by-side. • Waiting time: Average time that the target waits for the robot. • Social group force: Average social group force for the target Fgroup during the simula- tion. The results obtained for the computed metrics can be seen in Figures 28 and 29. In our simu- lations, the target and the robot behave as a group according to the SFM, which is modeled by the social group force. Thus, the social group force should be as small as possible, indicating that they both navigate together smoothly not needing to push each other. Also, due to this force, the target tends to wait for the robot when it stays behind. We measure this waiting time, which should be as small as possible. The total time indicates when they both reach to the destination and should be as short as possible. In Figure 28, for the heuristic approach, it can be seen how the total time for the macro-action and the waiting time for the target increase as the failures in the person’s detection do. This means that the performance is worse, since the target heeds to wait more for the robot along the experiment and they take longer to arrive. However, our POMDP-based approach main- tains a better performance when the observations from the detection system are worse, show- ing a graceful degradation. This is due to the fact that our approach deals with uncertainties and is able to overcome better these issues with the perception system, whereas the heuris- tic approach does not. Note that we modeled in our POMDP a probability of false negatives pfn = 0.1 but we tested the system under worse conditions, even up to pf = 0.8. Even in this case the approach’s performance does not degrade significantly. Regarding Figure 29, we can also see that the POMDP-based planner is able to keep a lower average social group force. This indicates that the robot and the target navigate together in a smoother manner and they do not need to attract each other as much as in the heuristic case. In that latter case, the robot stays behind more often and the attracting forces to stick to the person are higher. As a summary, our preliminary simulations show the advantages of using a POMDP-based

D4.3: Navigation Demonstrator Page 52 of 62 TERESA - 611153 planner on top of the SFM controller to navigate the robot jointly with the person toward a common goal. The system is able to cope better with uncertainties in the perception system and carries out the task more smoothly. We experienced these kinds of issues with people detections with the real perception system of TERESA, which indicates the usefulness of our approach to increase the robustness of the whole TERESA system.

D4.3: Navigation Demonstrator Page 53 of 62 TERESA - 611153

Figure 30: Some instants of different trials at the Centre Sportif. Top: "robot view". Bottom: some pictures of the demo.

8 Integrated Experiments

The navigation stack described above was deployed in the TERESA robot, and integrated with the rest of the modules of the system for Experiment 4 of the project, in Troyes, France. During these experiments, all macro-actions except for the walking side-by-side were integrated and available for the visitor. In this section, we show some results coming from the experiments. Two different kinds of experiments are shown. The first one corresponds to a demonstration with the TERESA robot in a event called "Faite-vous Plaisir, Seniors"16, about well-being of elderly people, and that took place on October 19, 2016 at the Centre Sportif de l’Aube (Troyes, France). The second one corresponds to the experiments of the project themselves, that took place at the day centre Les Arcades (Troyes, France).

8.1 Centre Sportif de l’Aube

As commented, the robot was demonstrated during an event about well-being of elderly people. The place of the event consisted of an area of 25 x 28 m approximately counting with a main hall with two corridors and stairs and two rooms (Fig. 1 shows the robot at the main hall). Several booths of different companies were there. During the demo, the robot TERESA could be used by any visitor. The autonomous navigation and conversation capabilities were activated. Many people were visiting the different stands walking around or standing conversing while having a coffee which turns the environment chal- lenging for navigation. The robot was working successfully the whole day in that environment, from 9.30 to 17.00. Figure 30 shows some instants of one navigation through the environment. A video of the experiment can be accessed at the D4.3: Navigation Demonstrator webpage17.

16http://teresaproject.eu/wp-content/uploads/2016/07/TERESA-Dissemination-Press-release.pdf 17http://robotics.upo.es/teresa/teresa-d43.html

D4.3: Navigation Demonstrator Page 54 of 62 TERESA - 611153

Figure 31: Pictures of experiment 4 at Les Arcades.

8.2 Les Arcades

The final experiments of the project took place at the elderly day centre "Les Arcades" in Troyes, France, from October 24th to November 2nd 2016. Figure 31 shows some pictures of people interacting with the TERESA robot at Les Arcades. A full description of the experiments can be found in the Deliverable D6.6 [19]. The different navigation macro-actions (except the walking side by side) were employed during Experiment 4. Figure 32 shows an illustrative example of the approaching macro-action where the robot performs an approach to the indicated person by the user in order to initiate an interaction. Another example, in this case for navigation to waypoint macro-action, can be seen in Figure 33. The videos of many sessions of the experiment 4 can be consulted in the D4.3: Navigation Demonstrator webpage where the behavior and transitions between the different modules of the integrated system can be observed. Explanations about the information represented in the videos is also provided.

D4.3: Navigation Demonstrator Page 55 of 62 TERESA - 611153

Figure 32: Approaching an interaction target at Les Arcades. The robot is commanded, through the interface, to approach the person with identifier 366 (in the interface, the user just have to click on an oval that appears on the person).

9 Conclusions

This deliverable has summarized the work on Tasks 4.2 and 4.3. As a result, a human-aware navigation stack has been incorporated into the TERESA robot. This stack is furthermore integrated with the perception modules from WP2, the interface from WP3 and the body-pose controller of WP5 to provide a semi-autonomous telepresence system with social intelligence. The navigation stack considers global planning, local planning and execution. Normative social behavior is included through cost functions and social constraints learned from demonstrations. This has been compared to non-human aware stacks and shown how it leads to different motion behaviors. A high-level task planner under uncertainties and a low-level social controller are also inte- grated for the task of walking side-by-side. This way, uncertainties on the sensor data and on unobservable latent variables, like people goals and intentions, can be considered to carry out human-robot joint tasks. The navigation stack has been deployed and tested in real environments. The document shows tests performed at the Centre Sportif de l’Aube, in Troyes, France, during a one-day event. It also includes videos of the tests in Les Arcades. The robot was working with the navigation stack during several trials for two weeks. Some videos of those experiments can be found at http://robotics.upo.es/teresa/teresa-d43.html.

D4.3: Navigation Demonstrator Page 56 of 62 TERESA - 611153

Figure 33: Navigation to a waypoint at Les Arcades. The robot is commanded, through the interface, to go to a certain predefined waypoint (in the interface, the user just have to click on the name of the waypoint where she/he is interested to go).

D4.3: Navigation Demonstrator Page 57 of 62 Bibliography

[1] Pieter Abbeel and Andrew Y. Ng. Apprenticeship learning via inverse reinforcement learn- ing. In Proceedings of the twenty-first international conference on Machine learning, ICML ’04, pages 1–, New York, NY, USA, 2004. ACM. [2] Haoyu Bai, Shaojun Cai, Nan Ye, David Hsu, and Wee Sun Lee. Intention-aware online POMDP planning for autonomous driving in a crowd. In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages 454–460, May 2015. [3] Joydeep Biswas and Manuela M Veloso. Localization and navigation of the cobots over long-term deployments. The International Journal of Robotics Research, 32(14):1679– 1694, 2013. [4] S. Calinon. Robot Programming by Demonstration: A Probabilistic Approach. EPFL/CRC Press, 2009. EPFL Press ISBN 978-2-940222-31-5, CRC Press ISBN 978-1-4398-0867- 2. [5] S. Calinon and A. Billard. A probabilistic programming by demonstration framework han- dling constraints in joint space and task space. Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, 2008. [6] Jonathan Claassens. A RRT-based path planner for use in trajectory imitation. In Proc. of the International Conference on Robotics and Automation, ICRA, pages 3090–3095. IEEE, 2010. [7] TERESA Consortium. Deliverable D2.2: Pose Estimation. http://teresaproject.eu/wp-content/uploads/2016/06/D2.2_Localisation.pdf. [8] TERESA Consortium. Deliverable D3.1: Normative Behavior Report. http://teresaproject.eu/wp-content/uploads/2015/03/D3_1_ NormativeBehaviourReport_TERESA.pdf. [9] TERESA Consortium. Deliverable D4.1: Models for Social Navigation. http://teresaproject.eu/wp-content/uploads/2015/03/D4_1_Models_for_social_ navigation_TERESA.pdf. [10] TERESA Consortium. Deliverable D4.2: Models for Social Navigation, Part 2. http://teresaproject.eu/wp-content/uploads/2015/09/D4_2_Navigation_Models_ Part_2_TERESA.pdf. [11] TERESA Consortium. Deliverable D5.1: Integrated Feedback System, Part 1. http://teresaproject.eu/wp-content/uploads/2015/03/D5_1_Integrated_ Feedback_System_TERESA.pdf. [12] TERESA Consortium. Deliverable D5.2: Action Library. http://teresaproject.eu/wp-content/uploads/2015/08/D5_2_Action_Library_ TERESA.pdf.

58 TERESA - 611153

[13] TERESA Consortium. Deliverable D5.3: Integrated Feedback System, Part 2. http://teresaproject.eu/wp-content/uploads/2016/06/D5_3_Feedback_Part_2.pdf. [14] TERESA Consortium. Deliverable D5.5: Integrated Feedback System, Part 3. [15] TERESA Consortium. Deliverable D6.1: Requirements Report. http://teresaproject.eu/wp-content/uploads/2015/03/D6_1_ Requirements-Report_-TERESA1.pdf. [16] TERESA Consortium. Deliverable D6.3: First version of the semi-autonomous telepres- ence system. http://teresaproject.eu/wp-content/uploads/2015/03/D6_1_ Requirements-Report_-TERESA1.pdf. [17] TERESA Consortium. Deliverable D6.4: Second version of the semi-autonomous telep- resence system. [18] TERESA Consortium. Deliverable D6.5: Third version of the semi-autonomous telepres- ence system. [19] TERESA Consortium. Deliverable D6.6: Evaluation report. [20] Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, Vittorio Murino, Andrea Fossati, and Alessio Del˜Bue. Social inter- action discovery by statistical analysis of F-formations. British Machine Vision Conference (BMVC), pages 23.1–23.12, 2011. [21] M. Desai, K.M. Tsui, H.A. Yanco, and C. Uhlik. Essential features of telepresence robots. In Technologies for Practical Robot Applications (TePRA), 2011 IEEE Conference on, pages 15–20, April 2011. [22] D. Feil-Seifer and M. Mataric. People-aware navigation for goal-oriented behavior involv- ing a human partner. In Proceedings of the IEEE International Conference on Develop- ment and Learning (ICDL), 2011. [23] G. Ferrer and A. Sanfeliu. Proactive kinodynamic planning using the extended social force model and human motion prediction in urban environments. In 2014 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems, pages 1730–1735, Sept 2014. [24] Gonzalo Ferrer, Anaís Garrell Zulueta, Fernando Herrero Cotarelo, and Alberto Sanfe- liu. Robot social-aware navigation framework to accompany people walking side-by-side. Autonomous Robots, pages 1–19, 2016. [25] Dieter Fox, W. Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics and Automation, 4(1), 1997. [26] Dieter Fox, W. Burgard, and Sebastian Thrun. The dynamic window approach to collision avoidance. IEEE Robotics and Automation, 4(1), 1997. [27] A. Goldhoorn, A. Garrell, R. Alquézar, and A. Sanfeliu. Continuous real time pomcp to find-and-follow people by a humanoid service robot. In 2014 IEEE-RAS International Conference on Humanoid Robots, pages 741–747, Nov 2014. [28] Edward T. Hall. The Hidden Dimension. Anchor, October 1990. [29] Nick Hawes, Chris Burbridge, Ferdian Jovan, Lars Kunze, Bruno Lacerda, Lenka Mu- drová, Jay Young, Jeremy Wyatt, Denise Hebesberger, Tobias Körtner, et al. The strands project: Long-term autonomy in everyday environments. arXiv preprint arXiv:1604.04384, 2016.

D4.3: Navigation Demonstrator Page 59 of 62 TERESA - 611153

[30] Dirk Helbing and Péter Molnár. Social force model for pedestrian dynamics. Phys. Rev. E, 51:4282–4286, May 1995. [31] Peter Henry, Christian Vollmer, Brian Ferris, and Dieter Fox. Learning to navigate through crowded environments. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages 981–986, 2010. [32] O. Islas Ramírez, H. Khambhaita, R. Chatila, M. Chetouani, and R. Alami. Robots learning how and where to approach people. In RO-MAN 2016 25th, IEEE International Sympo- sium on Robot and Human Interactive Communication, page To appear, 2016. [33] L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134, 1998. [34] T. Kanda, D.F. Glas, M. Shiomi, and N. Hagita. Abstracting people’s trajectories for social robots to proactively approach customers. Robotics, IEEE Transactions on, 25(6):1382– 1396, Dec 2009. [35] Sertac Karaman and Emilio Frazzoli. Sampling-based algorithms for optimal motion plan- . The International Journal of Robotics Research, 30(7):846–894, 2011. [36] R. Kirby, J. J. Forlizzi, and R. Simmons. Affective social robots. Robotics and Autonomous Systems, 58:322–332, 2010. [37] Rachel Kirby, Reid G. Simmons, and Jodi Forlizzi. Companion: A constraint-optimizing method for person-acceptable navigation. In RO-MAN, pages 607–612. IEEE, 2009. [38] Henrik Kretzschmar, Markus Kuderer, and Wolfram Burgard. Learning to predict trajecto- ries of cooperatively navigating agents. In Robotics and Automation (ICRA), 2014 IEEE International Conference on, pages 4015–4020. IEEE, 2014. [39] Thibault Kruse, Amit Kumar Pandey, Rachid Alami, and Alexandra Kirsch. Human-aware robot navigation: A survey. Robotics and Autonomous Systems, 61(12):1726 – 1743, 2013. [40] Markus Kuderer, Shilpa Gulati, and Wolfram Burgard. Learning driving styles for au- tonomous vehicles from demonstration. In Proceedings of the IEEE International Confer- ence on Robotics & Automation (ICRA), Seattle, USA, volume 134, 2015. [41] Markus Kuderer, Henrik Kretzschmar, Christoph Sprunk, and Wolfram Burgard. Feature- based prediction of trajectories for socially compliant navigation. In Proc. of Robotics: Science and Systems (RSS), Sydney, Australia, 2012. [42] M. Luber, L. Spinello, J. Silva, and K. O. Arras. Socially-aware robot navigation: A learning approach. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 902–907, Oct 2012. [43] M. Luber, G.D. Tipaldi, and K.O. Arras. Place-dependent people tracking. The Interna- tional Journal of Robotics Research, 30(3):280–293, 2011. [44] Luis Merino, Joaquín Ballesteros, Noé Pérez-Higueras, Rafael Ramó-Vigo, Javier Pérez- Lara, and Fernando Caballero. Robust person guidance by using online pomdps. In Manuel A. Armada, Alberto Sanfeliu, and Manuel Ferre, editors, ROBOT2013: First Iberian Robotics Conference, volume 253 of Advances in Intelligent Systems and Com- puting, pages 289–303. Springer International Publishing, 2014. [45] D. Meyer-Delius, M. Beinhofer, and W. Burgard. An instantaneous topological map for correlated stimuli. In Proceedings of the AAAI Conference on Artificial Inteligence (AAAI), Toronto, Canada, 2012.

D4.3: Navigation Demonstrator Page 60 of 62 TERESA - 611153

[46] M. Moussaid, D. Helbing, S. Garnier, A. Johansson, M. Combe, and G. Thearaulaz. Ex- perimental study of the behavioural mechanisms underlying self-organization in human crowds. Proceedings of the Royal Society B, 276:2755–2762, 2009. [47] M. Moussaid, N. Perozo, S. Garnier, D. Helbing, and G. Thearaulaz. The walking be- haviour of pedestrian social groups and its impact on crowd dynamics. Plos One, 2010. [48] Billy Okal and Kai O Arras. Formalizing Normative Robot Behavior. In Social Robotics: 8th International Conference, ICSR 2016, Kansas City, MO, USA, November 1-3, 2016 Proceedings, pages 62–71, Cham, 2016. Springer International Publishing. [49] E. Pacchierotti, H.I. Christensen, and P. Jensfelt. Evaluation of passing distance for social robots. In IEEE Workshop on Robot and Human Interactive Communication (ROMAN), Hartfordshire, UK, September 2006. [50] Noé Pérez-Higueras, Fernando Caballero, and Luis Merino. Learning robot navigation behaviors by demonstration using a rrt* planner. In International Conference on Social Robotics, pages 1–10. Springer International Publishing, 2016. [51] Rafael Ramon-Vigo, Noe Perez-Higueras, Fernando Caballero, and Luis Merino. Analyz- ing the relevance of features for a social navigation task. In Luis Paulo Reis, Antonio Paulo Moreira, Pedro U. Lima, Luis Montano, and Victor Munoz-Martinez, editors, Robot 2015: Second Iberian Robotics Conference, volume 418 of Advances in Intelligent Systems and Computing, pages 235–246. Springer International Publishing, 2015. [52] Rafael Ramón-Vigo, Noé Pérez-Higueras, Fernando Caballero, and Luis Merino. A frame- work for modelling local human-robot interactions based on unsupervised learning. In International Conference on Social Robotics, pages 32–41. Springer International Pub- lishing, 2016. [53] J. Rios-Martinez, A. Spalanzani, and C. Laugier. From proxemics theory to socially-aware navigation: A survey. International Journal of Social Robotics, 7(2):137–153, 2015. [54] Stéphane Ross, Joelle Pineau, Sébastien Paquet, and Brahim Chaib-Draa. Online Plan- ning Algorithms for POMDPs. The journal of artificial intelligence research, 32(2):663– 704, jul 2008. [55] Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. F-formation detec- tion: Individuating free-standing conversational groups in images. PLoS ONE, 10(5):1–32, 2015. [56] David Silver and Joel Veness. Monte-carlo planning in large pomdps. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 2164–2172. Curran Associates, Inc., 2010. [57] Emrah Akin Sisbot, Luis Felipe Marin-Urias, Rachid Alami, and Thierry Siméon. A Human Aware Mobile Robot Motion Planner. IEEE Transactions on Robotics, 23(5):874–883, 2007. [58] Gian Diego Tipaldi and Kai O. Arras. Planning problems for social robots. In Proc. Inter- national Conference on Automated Planning and Scheduling (ICAPS’11), Freiburg, Ger- many, 2011. [59] Xuan-Tung Truong and Trung-Dung Ngo. Dynamic social zone based mobile robot naviga- tion for human comfortable safety in social environments. International Journal of Social Robotics, pages 1–22, 2016.

D4.3: Navigation Demonstrator Page 61 of 62 TERESA - 611153

[60] K.M. Tsui, M. Desai, H.A. Yanco, and C. Uhlik. Exploring use cases for telepresence robots. In Human-Robot Interaction (HRI), 2011 6th ACM/IEEE International Conference on, pages 11–18, March 2011. [61] B. Ziebart, A. Maas, J. Bagnell, and A. Dey. Maximum entropy inverse reinforcement learning. In Proc. of the National Conference on Artificial Intelligence (AAAI), 2008.

D4.3: Navigation Demonstrator Page 62 of 62