DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019

A Novel SLAM Quality Evaluation Method

FELIX CAESAR

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

A Novel SLAM Quality Evaluation Method

FELIX CAESAR

Master’s in System, Control and Date: July 1, 2019 Supervisor: Jana Tumova Examiner: John Folkesson School of Electrical Engineering and Computer Science Host company: ÅF Swedish title: En ny metod för kvalitetsbedömning av SLAM

iii

Abstract

Autonomous vehicles have grown into a hot topic in both research and indus- try. For a vehicle to be able to run autonomously, it needs several different types of systems to function properly. One of the most important of them is simultaneous localization and mapping (SLAM). It is used for estimating the pose of the vehicle and building a map of the environment around it based on sensor readings. In this thesis we have developed an novel approach to mea- sure and evaluate the quality of a landmark-based SLAM algorithm in a static environment. The error measurement evaluation is a multi-error function and consists of the following error types: average pose error, maximum pose error, number of false negatives, number of false positives and an error relating to the distance when landmarks are added into the map. The error function can be tailored towards specific applications by settings different weights for each error.

A small research concept car with several different sensors and an outside tracking system was used to create several datasets. The datasets include three different map layouts and three different power settings on the car’s engine to create a large variability in the datasets. FastSLAM and EKF-SLAM were to test the proposed SLAM evaluation method. A comparison to just the pose error was made to asses if our method can provide more information concern- ing establishing SLAM quality. Our results show that the pose error is often a good enough indicator of SLAM quality. However it can occasionally be mis- leading with errors related to mapping (location of landmarks, false negative and false positive landmarks). By using the method presented in this thesis, errors relating to the mapping will be more easily detected than by looking at the pose error. iv

Sammanfattning

Autonoma fordon har vuxit till ett viktigt ämne både inom forskning och in- dustri. För att ett fordon ska kunna köras autonomt behövs det att flera olika system fungerar korrekt. En av de viktigaste av dem är simultaneous localiza- tion and mapping (SLAM). Det används för att uppskatta fordonets position samt för att bygga en karta av miljön runt fordonet. I det här examensarbe- tet har vi utvecklat en ny metod för att mäta och utvärdera kvaliteten på en landmärkes-baserad SLAM-algoritm i en statisk miljö. Metoden består av föl- jande feltyper: medelvärdet av positionsfelet, det maximala positionsfelet, an- tal falskt negativa fel, antal falskt positiva fel samt ett fel relaterat till avståndet när ett landmärke läggs till i kartan. Genom att använda vikter för varje fel kan metoden skräddarsys till en specifik applikation.

En liten konceptbil med flera olika sensorer och ett yttre spårningssystem användes för att skapa flera dataset. Dataseten innehåller tre olika kartlayouter och tre olika effektinställningar på bilen för att skapa stor variation i datase- ten. FastSLAM och EKF-SLAM användes vid testningen av den nya metoden för kvalitetsbedömning av SLAM. Den nya metoden jämfördes mot positions- felet för att analysera ifall den nya metoden är ett bättre sätt att mäta SLAM- kvalitet. Våra resultat visar att positionsfelet ofta är en tillräckligt bra indikator för SLAM-kvalitet, men det kan ibland vara vilseledande gällande fel i kart- läggningen (positioner av landmärken, falskt negativa fel och falskt positiva fel). Genom att använda metoden som presenteras i det här examensarbete är fel som är relaterade till kartläggningen lättare att upptäcka än om man enbart kollar på positionsfelet. v

Acknowledgements

I would like to thank my Supervisor at KTH Jana Tumova for her help dur- ing the thesis. Give a big thanks to ÅF, my supervisor Johan Olsén and my manager Mia Sköld for providing me with the material and help necessary for my thesis. I would also like thank my fellow thesis students: Hugo Skepp- ström, Richard Hedlund and Kristian Whalqvist for helping with different is- sues during the thesis. An extra thanks to Kristian for allowing me to use the EKF-SLAM he implemented. Contents

1 Introduction 1 1.1 Problem Definition ...... 2 1.2 Research Question and Hypothesis ...... 3 1.3 Scope ...... 4 1.4 Sustainability, Societal Impact & Ethics ...... 4 1.5 Outline ...... 5

2 Background 6 2.1 SLAM ...... 6 2.1.1 Formulation and Structure of the SLAM Problem . . .7 2.1.2 Map Representations ...... 8 2.1.3 Data Association ...... 10 2.1.4 Solutions to the SLAM Problem ...... 10 2.2 Related Work ...... 12 2.2.1 Datasets & Creating Ground-Truth ...... 12 2.2.2 Evaluating the Quality of SLAM ...... 13

3 Methods 17 3.1 Multi-Error Function ...... 17 3.1.1 Pose Error ...... 19 3.1.2 Maximum Pose Error ...... 20 3.1.3 Map Error ...... 21 3.1.4 False Negative Error ...... 21 3.1.5 False Positive Error ...... 23 3.1.6 Distance Error ...... 23 3.1.7 Measurement Guidelines ...... 24 3.2 Experimental Setup ...... 25 3.2.1 Research Concept Car ...... 25 3.2.2 Ground-Truth Capturing System ...... 26

vi CONTENTS vii

3.2.3 Datasets Created ...... 27 3.2.4 SLAM algorithms Used During Testing ...... 28

4 Results & Discussion 32 4.1 Weights and Parameters Used in the Error Function ...... 32 4.2 Turn Datasets ...... 33 4.2.1 Turn Dataset Result Discussion ...... 35 4.3 S-curve Datasets ...... 36 4.3.1 S-curve Dataset Result Discussion ...... 38 4.4 Loop Datasets ...... 39 4.4.1 Loop Dataset Result Discussion ...... 41 4.5 All Results & Multi-Error Function Discussion ...... 42

5 Future Work & Conclusions 46 5.1 Conclusions ...... 46 5.2 Future Work ...... 47

Bibliography 47

A 55

B 56

C 58

Chapter 1

Introduction

Automated, autonomous vehicles and mobile (AVs) may soon overtake conventional vehicles and dominate the automotive industry. Once AVs reach a state where they are sufficiently reliable and affordable, they will impact sev- eral markets in many industries. It is estimated that AVswill have an economic impact of $1.2 trillion on the American automotive industry [1]. The effects of AVs can already be seen today: autonomous trucks are used for transportation in mines [2], autonomous cars are being tested and evaluated in urban envi- ronments [3] and several consumer grade vehicles already contain rudimentary autonomous functions that reach the threshold of partial automation [4] [5]. The rise of AVs will not only impact conventional automotive industries, but can create new industries such as delivering packages with drones [6]. Strategy Analytic has predicted that fully AVs will create a "Passenger Economy" that will represent a $7 trillion global opportunity in 2050 [7]. The emergence of AVsare carried by the advancement in fields such as: perception, sensors, path planning and machine learning. A field of great importance is called Simulta- neous Localization and Mapping (SLAM). Specifically, SLAM is a problem that arises when an agent does not have access to a map of the environment and needs to locate itself within this unknown environment and simultaneously construct a map of it [8] [9]. Agent can refer to a car, a truck, a or some- thing else that is equipped with the necessary equipment to perform SLAM. SLAM as a research area has made great progress over the past 30 years, tran- sitioning to large-scale applications in several industries [10]. In [10] Cadena et al. tries to answer the question ”is SLAM solved?” They reach the conclu- sion that SLAM research is entering the robust-perception age. They argue that the robust-perception age and SLAM today is characterized by four key requirements: robust performance, high-level understanding, resource aware-

1 2 CHAPTER 1. INTRODUCTION

ness and task-driven perception and that the question ”is SLAM solved?” can not be answered without specifying a robot/environment/performance com- bination. To really achieve autonomous robots, more research into SLAM is needed. As the number of SLAM algorithms and applications that employ SLAM increase, the importance of evaluating and measuring the quality of SLAM is paramount. In this thesis, the problem of evaluating the quality of a landmark-based SLAM algorithm in a static environment is addressed and a novel approach is developed that incorporates several important aspects of the localization and map creation.

1.1 Problem Definition

SLAM can be divided into two different environmental types: static and dy- namic. Static means that all objects in the environment have a constant posi- tion except the agent. In a dynamic environment, besides static objects there are dynamic objects which move around as the mapping takes place. From an AV’s perspective a static environment could e.g., be buildings, intersections, road boundaries and street signs. All of these objects could represent potential landmarks for a SLAM algorithm. Hence, the focus in this thesis will be on static objects for landmark-based SLAM.

Almost all evaluations of SLAM algorithms are based only on the pose errror [11] [12] [13]. Depending on the SLAM algorithm used, the map rep- resentation is different. We argue that to get a better measurement of quality the error measurement should be more rigorous and tailored towards specific map representations, as the choice of SLAM algorithm for a particular situa- tion requires in-depth knowledge about the advantages and disadvantages of it [14].

Evaluating the quality of SLAM in a static environment can be divided into the following errors: Things not detected (false negatives), things detected, but in the wrong place and things falsely detected (false positives). However, in this thesis a novel error measurement relating to the distance between the agent and landmark when they are added into the map will also be included. The objective and purpose of this thesis is to create a novel way to measure and evaluate the quality of a landmark-based SLAM algorithm in a static en- vironment by incorporating the following errors into a single function:

• Pose error, actual path compared to ground-truth path. CHAPTER 1. INTRODUCTION 3

• Map error, the positional difference between real and estimated land- marks.

• False negatives.

• False positives.

• Distance error.

These five errors are interesting because they capture many important as- pects of a landmark based SLAM; they are also interesting for what they can represent from an AV’s perspective. False negatives and false positives can e.g. represent important landmarks such as pedestrians and cars. The pose and map error are extremely important for all SLAM algorithms. There are often other systems in an AV that use the information that SLAM provides, e.g. path planning. Hence, the distance to landmarks from the agent when they are added into the map can be very important, if they are added "too late" into the map they can not be used by relevant subsystems.

1.2 Research Question and Hypothesis

SLAM is one of the most challenging problems in mobile robotics and there exists a large number of SLAM algorithms. The choice of which SLAM algo- rithm that should be used for a particular application or solution requires prior knowledge about the advantages and disadvantages of that particular SLAM algorithm [14]. This requires methods that can quantitatively evaluate SLAM algorithms by incorporating several aspects of the map creation, to better cap- ture the quality of the map and its creation process. This thesis will look at a particular type of SLAM algorithm and map creation and it is summarized as the research question below.

How to more rigorously evaluate the quality of a landmark based SLAM algorithm by incorporating the following errors into the error measurement: pose error, map error, false negatives, false positives and when things (landmarks) are seen.

This research question and thesis will also evaluate the following hypoth- esis.

The choice of how the quality of SLAM is evaluated impacts which algorithms appears the most suitable for a given scenario. 4 CHAPTER 1. INTRODUCTION

This hypothesis is tested by comparing the error-ranking (i.e. error) be- tween the proposed method of evaluating SLAM quality developed in this thesis to just the pose error (as this is the most common way of measuring SLAM quality) for different SLAM algorithms and datasets. If the error- ranking (which SLAM has the best score) is the same over all datasets, it is an indication that the hypothesis is false and that just looking at the pose error is a good enough indication of SLAM quality.

1.3 Scope

The objective of the thesis is to create a novel way of measuring SLAM quality in a static environment, which will be refereed to as "multi-error function" from this point on in the report. The multi-error function incorporates several different errors that give an indication of the quality of the map created and thereby the quality of the SLAM algorithm. The multi-error function will be tested and evaluated for both EKF-SLAM and FastSLAM 1.0 and compared to pose error. There does not exists suitable dataset for this evaluation, so during the thesis several datasets will be constructed with a small research concept car that is equipped with several sensors. The ground-truth of the car and map will be created with a simple tracking system. The tracking system and the SLAM algorithms that will run on the car will be based as much as possible on previous work and open source.

1.4 Sustainability, Societal Impact & Ethics

The rise of AVs will reduce the need for conventional transportation and mil- lions of people that make their living of driving, such as cab and truck drivers, will loose their job and income. AVs will directly cause many people to loose their livelihood and this will severely increase the unemployment in fields re- lating to human drivers [15]. However, AVs societal impact will not only be rising unemployment, it will in turn create new markets, such as a "Passen- ger Economy" that was mentioned in section 1.1. It might also cause other unforeseen societal impacts. A large part of society does not directly work with driving, still own and drive a car every day. What will happen when you no longer have to focus on driving, but can instead spend that time on some other task? Hence, the gain from the AVs might outweigh the loss as it will introduce new markets and create new job opportunities [16]. AVs have the potential to free up time for commuters, parking space can probably be greatly CHAPTER 1. INTRODUCTION 5

reduced, and accidents will probably decrease as well [17].

A large ethical dilemma is: If the car can’t prevent an accident, what should it do? Should it sacrifice the people in the car to save some else? Drive off the road or crash into incoming traffic? Should the car always follow traffic laws, even if this means hitting a pedestrian crossing the road illegally? This is a really difficult choice, as somewhere this decision has to be coded into the car and the value of a human life be set. This is a classic example of the trolley problem [18], "would you kill one to save five?". The answer to this question is difficult and setting a universal moral code for what the vehicle should do might be even more so. This is supported by a survey published in Nature [19] that shows that moral choices are not universal and can vary greatly.

AVs might have a very large positive impact on sustainability. They would probably reduce the amount of accidents caused by human error and thereby save both lives and money. An AVs would probably be much more efficient than a human when it comes to driving and would thereby reduce emissions greatly. Another potential positive impact is if a fleet model is adopted. A large amount of vehicles are owned by a business or some other larger entity and they are deployed like taxis. This would reduce the number of vehicles on our streets greatly [20].

In this thesis, our research contained no ethically sensitive steps.

1.5 Outline

The report is divided into 5 chapters. Chapter 2 introduces SLAM from a tech- nical standpoint and several important aspects of SLAM, such as map repre- sentation and data association. The chapter also discusses previous work re- lated to the thesis, such as different way of evaluating SLAM quality and how to create ground-truth. Chapter 3 explains the research concept car used and how the ground-truth and datasets were created. It also goes into detail about the SLAM algorithms used during testing. This chapter also explains in-depth how the multi-error function works. Chapter 4 discusses the results of each dataset, the SLAM algorithms and how they affected the multi-error function. Chapter 5 discusses potential future work relating to evaluating SLAM qual- ity as well as stating the most important results and conclusions drawn in the thesis. Chapter 2

Background

This chapter gives an overview of SLAM, research relating to creating ground truth and different ways of evaluating SLAM. Section 2.1 introduces the SLAM problem. Section 2.1.1 provides a common mathematical formulation and structure of the SLAM problem. Section 2.1.2 goes over three different map representations. Section 2.1.3 explains what the data association step of SLAM is. Section 2.1.4 explains two algorithms that are used to solve the SLAM problem. Section 2.2 describes the related work. Section 2.2.1 discusses work related to creating ground-truth and section 2.2.2 discusses work related to evaluating SLAM.

2.1 SLAM

The SLAM problem is whether or how it is possible for an agent to be placed in a unknown environment with an unknown location and incrementally build a consistent map of its environment while simultaneously determining its lo- cation within this map. Agent can refer to a car, truck or any type of . Environment can refer to basically anything, a city, a warehouse or an apartment room. To perform SLAM, a range of different sensors and algo- rithms are used. A common solution for solving the localization part for AVs is by using the Global Positioning System (GPS). However it is not always available and its accuracy is in the meter range [21]. Instead, sensors located on the agent, e.g. Light Detection and Ranging (Lidar), Red Blue Green Dis- tance (RGB-D) cameras, stereo cameras, are used to identify landmarks in the environment from which the agent’s position can be determined. A map over the landmark’s position based on the agent position must be determined dur- ing operation, simultaneously as the position of the agent must be estimated

6 CHAPTER 2. BACKGROUND 7

within the map. This creates a chicken and egg problem; the agent and land- mark locations are needed to create the map while the map is needed to estab- lish the agent and landmarks locations. This problem is solved by combining information from multiple sources. The on-board sensors provide estimated distances to potential landmarks and the agents odometry provides an estimate of the agents movement. Combining these allows for estimating both the map and the movement of the agent. There are several different algorithms and sensors available to solve the SLAM problem, each with their strengths and weaknesses.

2.1.1 Formulation and Structure of the SLAM Problem SLAM is to the best of our knowledge always structured in the following prob- abilistic form described by Thrun in [22] and Durrant-Whyte et al. in [8]:

P (xt, m|z0:t, u0:t) (2.1)

• xt: the state vector describing position and orientation of the vehicle (often referred to as pose). Often the state vector is the current x-y co- ordinate and angle of the agent.

• ut: the current control vector, applied to drive the agent to the next state. The control parameters could be the steering angle applied to the agent, the increase in velocity or whether the agent is breaking.

• m: the map of the environment. In landmark-based SLAM it can for example be a vector of the x-y coordinates of all known landmarks.

• zt: the current observation/measurement vector. It contains information from the sensors about landmark observations. It can e.g., be the ranges and angles to the landmarks the sensors are currently detecting.

The probability distribution in eq 2.1 describes the joint posterior density of the agent’s state and the landmarks locations given all measurements and control inputs (i.e. what is the distribution of an agent’s potential location and map, given a set of observations and inputs) From an estimate of the current distribution, the next state distribution is calculated from the current control ut, the current measurement zt and Bayes filter. The Bayes filter is a two step recursive process that predicts the next state based on the current state and input, it then updates this state based on the current measurement. This process 8 CHAPTER 2. BACKGROUND

requires a transition, and an observation model. The transition model is given by the following distribution.

P (xt|xt−1, ut) (2.2) The transition model is assumed to be a Markov process, i.e. the next state depends only on the previous state xt−1 and the applied control ut. The transition model is used in the predict step of Bayes filter and the uncertainty increases during this step. The observation model is given by the following distribution.

P (zt|xt, m) (2.3)

It describes the probability of making a specific observation zt given the current pose and map. The observation model is used in the update step of Bayes filter and the uncertainty should decrease during this step. The Bayes filter algorithm is given in Equation 2.4 and 2.5 (η in 2.5 is a normalizing fac- tor to make sure the distribution integrates to 1).

Predict Z P (xt, m|zt−1, ut) = P (xt|xt−1, ut) ∗ P (xt−1, m|zt−1, ut−1)dxt−1 (2.4)

Update

P (xt, m|zt, ut) = η ∗ P (zt|xt, m) ∗ P (xt, m|zt−1, ut) (2.5)

The joint posterior in Equation 2.1 can now be calculated from Equation 2.4 and 2.5.

2.1.2 Map Representations Depending on the environment, sensors available, feature extraction and SLAM algorithm used, the map representation will look very different. There are many different types of sensors used in SLAM e.g. Stereo Cameras and Li- dar; Appendix B describes a few common sensors. Depending on the sensors and SLAM algorithm used, the features or landmarks that can be extracted and used from environment differs. Feature extraction is detecting things in the environment that can be used in the map representation. Some algorithms use generic features such as Scale Invariant Feature Transform (SIFT) features and others use specific landmarks; see Appendix C for more information. The CHAPTER 2. BACKGROUND 9

following are three common map representations/visualizations. In this the- sis, the focus is on landmark-based maps. The other two are used to show how different the representations can be.

Landmark In a landmark map, the map is composed of point landmarks. Usually, al- gorithms that utilize these type of maps work better with distinct landmarks. Distinct landmarks are better because they are easier to detect and extract from the environment during the feature extraction step. These features might even be artificial if the environment permits it, e.g. if mobile robots are used to drive around a warehouse, artificial markers could be used so the robots could keep track of where they are. Usually a landmark is assigned a position and a covariance depending on how confident the agent is of the landmarks loca- tion. Sometimes a landmark can also be assigned an identifier, e.g. a color or size [22].

Occupancy Grid An occupancy grid map is a fine-grained grid over a continuous space of lo- cations. The map is divided into cells, a cell can either have a value of 0 or 1. They represent that the cell is either occupied or not. The accuracy of the map increases with smaller cells. However, decreasing the size of each cell increases the number of cells occupying the same region and thereby com- putational time. An occupancy grid map is usually a 2-D slice of the world, where each cell is a 2-D square. It can be extended into 3-D where each cell is a cube, but at a large computational cost. Compared to a landmark map (that heavily relies on the feature extraction step that usually only works with specific features) the occupancy grid map can map any generic environment because it only fills a grid slot if a sensor detects something at the coordinates relating to that grid slot. [23].

Point Cloud Visual SLAM algorithms based on feature extraction such as SIFT, SURF and ORB (see Appendix C) can represent their maps with point clouds.They find features in the environment based on their extraction algorithm and each fea- ture has a position and orientation assigned to it. The features can then be mapped into a 3-D map visualized as point clouds. 10 CHAPTER 2. BACKGROUND

2.1.3 Data Association For each landmark detected by the feature extraction, at each update step in the SLAM algorithm a calculation is done to decide whether the detected land- mark is a new landmark that needs to be added into the map, or if it is an already mapped landmark. This process is called data association. Depend- ing on the SLAM algorithm used, the data association step and algorithm can differ. This explanation is valid for landmark-based SLAM data association.

The compatibility between an observation (feature/landmark from the fea- ture extraction step) and mapped landmarks is calculated. If this compatibility is larger than a set threshold for all landmark and observation combinations, the observation is seen as a new landmark and is added into the map. This process is done for each individual observation (each potential new unmapped feature) at each time step. This compatibility test is usually called "Individual Compatibility Nearest Neighbor" (ICNN) and is popular because of its sim- ple implementation and low computational time, as it at most performs mn (m are landmarks and n are observations) tests at each timestep [24]. How- ever, it is not considered sufficiently robust for many tasks as it can not handle faulty associations that well. Instead "Joint Compatibility Branch and Bound" (JCBB) can be used [24]. JCBB considers all observations simultaneously at each timestep compared to ICNN that considers each one individually. This leads to a much more robust algorithm, but the computational time is much higher.

2.1.4 Solutions to the SLAM Problem There exists a great number of solutions to the SLAM problem and they can be divided into two main groups, Online and Full SLAM. Full SLAM tries to solve not only for the current pose, but the whole trajectory, which makes it much more computationally demanding and is usually not able to run in real time. Online-SLAM solves only for the current pose. This section will describe two popular online-SLAM algorithms based of the Kalman and par- ticle filter.

Extended Kalman Filter One of the earliest and probably most influential SLAM algorithms is based on the Extended Kalman Filter. It was first published by Moutarlier and Chatila in 1989 [25]. In EKF-SLAM the map is represented by features in the form CHAPTER 2. BACKGROUND 11

of point landmarks. Usually these landmarks are specific things in the envi- ronment which means that EKF-SLAM relies on good feature extraction algo- rithms. EKF-SLAM makes a Gaussian noise assumption for both the transi- tion and observation model. EKF-SLAM represents the pose and map with a single large mean µt vector and covariance Σt matrix that includes both pose and locations of all currently known landmarks. As EKF-SLAM uses linearization to approximate non-linear system, they must be relatively linear around the working point and if the true state does not exists close to this point the uncertainty will not converge. Another problem with EKF-SLAM is that it can only track one potential mode (hypothesis for the pose and each landmark) and it does not scale well.

Two popular versions of EKF is the unscented Kalman filter (UKF) and the sparse extended information filter (SEIF). UKF tries to provide a better solution of the problem of linearizing a function. The Taylor series expansion used by EKF is not the only way to linearize the transformation of a Gaussian. UKF instead tries to approximate the Gaussian by passing a number of specific points through the nonlinear function, and estimating the Gaussian from those values. UKF yields the same result as EKF for a linear system, but performs equal or better for a non-linear system and should be more robust for a non- linear system [26]. SEIF tries to reduce the size of the covariance matrix. The standard information filter is subjected to the same assumptions as the EKF. However, they represent Gaussians in their canonical form instead of a mean and covariance. The SEIF is able to create a sparse information matrix by ignoring links with very low values in the information matrix. This enables the SEIF algorithm to be more efficient, because the of the data reduction [27].

Particle Filter A particle filter is a nonparametric approach used for approximating poste- riors over continuous spaces with a large number of hypotheses (i.e. parti- cles/samples). The key idea is to represent the posterior distribution given noisy observations by drawing random samples from this posterior. Each sample is a hypothesis of the real state (location and map) and because no assumption of the system is made, particle filters can represent a large num- ber of different systems [26]. However, a naive approach to a SLAM based of particle filters is infeasible as they scale exponentially.

Particle filters were first introduced into the area of SLAM by Murphy in 12 CHAPTER 2. BACKGROUND

2000/2001 [28]. A couple of years later Montemerlo introduced fastSLAM [29], based of Rao-Blackwellized particle filters (RBPF) which improves the sampling done compared to previous particle filters by marginalizing out some variables and thereby making the problem tractable. What makes FastSLAM useful is that each particle has its own version of the map and that the indi- vidual map errors are conditionally independent, which makes it possible to factor the mapping problem into separate problems. For each particle, Fast- SLAM uses a separate low-dimensional EKF for each individual landmark, compared to EKF-SLAM that uses a single Gaussian for all landmarks (This it what makes FastSLAM computationally efficient). Because data association is done per particle basis, this enables FastSLAM to be more robust than EKF- SLAM and enables it to track several hypotheses of possible true states [30].

2.2 Related Work

This section presents work related to the topics of how to create ground-truth and how to evaluate the quality of SLAM.

2.2.1 Datasets & Creating Ground-Truth To evaluate the quality and performance of SLAM the estimated map and localization needs to be compared to the real map and position, commonly known as ground truth. A problem with ground truth is that acquiring it is usually cumbersome and requires expensive equipment. This means many evaluations of SLAM are made on a few pre-recoreded datasets that may not provide the information necessary for in-depth evaluations. However, if all SLAM evaluations used their own datasets it would be difficult to quickly compare different SLAMs to each-other. Hence, a few benchmark datasets are often used to compare different SLAM algorithms. A comprehensive list of about twenty datasets published in [10] can be found at [31]. Three pop- ular ones are the KITTI dataset [32], the EuRoC dataset [33] and the TUM dataset [34].

The KITTI dataset contains sequences recorded from a Volkswagen station wagon in freeway, rural to inner-city environments with static and dynamic ob- jects. A variety of sensors were used to record the data, including: Color and greyscale stereo cameras, a Velodyne 3D laser scanner and a GPS/IMU inertial navigation system. Because of the wide range of sensors used in recording the CHAPTER 2. BACKGROUND 13

datasets, it is a popular benchmark dataset to evaluate different SLAM algo- rithms. It was used in evaluating ORB-SLAM2 [11] and several other visual SLAM algorithms. However, because of the nature of the dataset and the dif- ferent SLAM solutions, the errors that are compared are just translation RMSE (i.e. pose error).

The EuRoC dataset contains stereo sequences, IMU measurements and accurate ground truth; recorded from a micro-aerial vehicle in three different environments with varying difficulty. To create their ground truth they used a Leica laser tracker and Vicon motion capture system and they state that they reach millimeter accuracy in their ground truth. The TUM dataset contains se- quences for evaluating RGB-D SLAM systems. They used a Microsoft Kinect camera as a sensor and the ground truth was obtained with a motion capture system with eight high-speed tracking cameras. The datasets consists of of- fice environments and an industrial hall. The recorded datasets include both a hand-held Kinect sensor and a Kinect sensor mounted on a robot.

Koch et al. [35] attached a laser scanner to a for recording datasets and accurate ground truth. This enabled them to record a six de- gree of freedom datasets with a sub millimetre precision ground truth. In [36] Schmidt et al. uses a Microsoft Kinect and motion capture system consist- ing of five cameras to create the dataset and ground truth, similar to the TUM dataset [34]. Bayer et al. [37] proposes using the visual SLAM algorithm itself to create the ground truth (instead of relying on expensive external localization systems) by using offline frame-by-frame processing. Their results show that it is feasible alternative if better equipment is not available, but it takes a long time.

2.2.2 Evaluating the Quality of SLAM A problem with evaluating the quality of SLAM is that there are multiple ways this can be done. As mentioned in section 2.2.1, often only the pose is used for evaluating the performance. Depending on what implementation of SLAM is used, the process of creating the map and localizing itself within it vary greatly. Hence, more in-depth evaluations of different algorithms and map representations can vary greatly. Creating an in-depth evaluation that works, e.g. on both landmark maps and occupancy grid maps might be very difficult. To evaluate the quality and performance of SLAM, accuracy and precision 14 CHAPTER 2. BACKGROUND

can be useful in certain situation, but there are other ways of quantifying the error or errors. Just a few potential error measurements are pose error, map error, number of false-negatives and number of false-positives. Depending on the SLAM, the map representation of the world can look very different. E.g. a landmark-based SLAM has a clear indication of what a false positive landmark is, while some visual SLAM algorithms that use generic features such as SIFT and SURF (see Appendix C) does not have a clear indication what a false positive is, as they use anything in the environment that fulfils the criteria for that specific feature. Hence, how to quantify quality and performance of SLAM algorithms is a difficult topic. Vivet et al. in [38] state the following:

”A potential consumer of an algorithm’s output needs to know what types of incorrect/invalid results to expect, as some types of results might be acceptable while others are not. Thus multiple metrics are necessary for potential consumers to make intelligent decisions.” [38, p. 2]

and they suggest the following performance metrics: rate of convergence, computational complexity and quality of the results (error of the trajectory). All those might be interesting when evaluating and comparing SLAM algo- rithms, instead of just looking at the pose error as many SLAM evaluations do. It might also be interesting to look at quality measurements created specifically for a particular type of SLAM algorithm, its map representation and localiza- tion method.

In [14] Filatov et al. suggests three different metrics for evaluating and determining the quality of a map without relying on ground truth or trajectory, only the real and estimated map are needed. Their proposed errors metrics only work on 2D occupancy grid maps (map representations that are produced by SLAM algorithms such as Google Cartographer [39] and VinySLAM [40]). The three error metrics they suggest are: compare all estimated walls or other objects’ pixel values with the real map, compare the number of corners in the estimated map with the real map and the number of enclosed areas (areas in the map that are boarded completely when they should not).

Kummerle et al. discuss and provide a solution to the problem of looking at the pose error from a global perspective in [41]. As an early rotation error will cause a larger pose error (in total) than a late rotation error, even though the same error occurred, just at different times. To mitigate this they suggest one should calculate the error based on the corrected trajectory, so that a late CHAPTER 2. BACKGROUND 15

and early rotation error (of the same size) will have the same impact on the pose error.

In [38] Vivet et al. perform a novel evaluation of line-based EKF-SLAM (the landmarks are e.g. walls) by looking at the 2D line-map quality. They have divided the error into three metrics: map geometry quality, segment length quality and over or under segmentation. The map geometry quality is a mea- surement of dissimilarity in angle and length to the origin between two land- marks, the ground truth landmark and the estimated landmark. The segment length quality measures how much the estimated landmark and ground truth landmark overlap (i.e. how equal they are). The over or under segmentation error is the difference between unmatched features.

Varsadan et al. proposes a method in [42] for evaluating map quality based on a image similarity function from [43] defined as:

”The similarity function Ψ is defined as a sum over all colors of the average Manhattan-distance of the pixels with color c in picture a to the nearest pixel with color c in picture a’ and the average distance vice versa.” [42, p. 356]

But instead of using pixel colors, c corresponds to the probability of a cell in an occupancy grid map. They state that the main advantage of this method is that it can be very efficiently computed. They also discuss using cross entropy, measuring the difference between two probability distributions, which can be used to measure the correlation between two occupancy grids. However, for it to give a meaningful result the ground-truth map and estimated map has to be relatively well aligned.

Both Schuhmacher et al. and Barrios et al. discuss how to measure the error between two sets of vectors in [44] [45](e.g. could represent estimated and real landmarks). They suggest both incorporating localization errors and cardinal (the difference in number of vectors) errors into the total error.

According to Amigoni et al. [46] the evaluation of SLAM can be divided into two broad categories: intrinsic, referring to system efficiency such as com- putational time and extrinsic, referring to the evaluation of the final solution, e.g. pose error. They state different ways on how computational time could be measured: measuring the processing time in seconds, the number of floating 16 CHAPTER 2. BACKGROUND

point operations or number of iterations.

Because the large number of existing SLAM algorithms its probably not possible to create a unified metric or measurement (except pose error) for com- paring all SLAM algorithms. Instead, the choice of SLAM algorithm for a particular situation requires in-depth knowledge about advantages and disad- vantages of it [14]. Hence, just looking at the pose error when evaluating SLAM might not be good enough as it gives very little detail of what the pos- itive or negative aspects of a particular SLAM algorithm are. Hence, the eval- uation of a particular SLAM algorithm should go deeper, e.g. evaluating a landmark-based SLAM algorithm and a occupancy-grid-based SLAM algo- rithm should differ as they reach their final solution very differently. Valls et al. designed the winning car in the "Formula Student Driverless Competition" in 2017 [47]. They used the output from SLAM in their path planning stage. They state, that for the car to reach its full potential the track needs to be known 2s ahead. Hence, the distance between the landmark and agent when a land- mark is added into the map might be a very good error measurement when evaluating SLAM algorithms for this application. Because the further away the algorithm is able to detect objects the better the car will perform. Chapter 3

Methods

This chapter gives a description of the experimental setup used to create the datasets and it also explains in detail the multi-error function. Section 3.1 gives an overview of the multi-error function and section 3.1.1-3.1.7 goes into detail. Section 3.2 describes the experimental setup used to create the datasets Section 3.2.1 goes over details relating to the research concept car, section 3.2.2 describes the ground-truth capturing system, section 3.2.3 describes the datasets created and section 3.2.4 describes the SLAM algorithms used during testing.

3.1 Multi-Error Function

The novel SLAM quality evaluation method comes in the form of a multi-error function. It is divided into six parts, each one relating to a different type of error. The Total error is given Equation 3.1 and each part will be explained subsequently. The pose error is given by ep1 , the maximum pose error is given by ep2 , the map error is given by emap, the false negative error is given by efn, the false positive error is given by efp and the distance error is given by ed. Each error will be defined in their respective section. Each error has a weight wi that is used to set the errors impact on the Total error. Table 3.1 defines all variables and constants used in the multi-error function.

17 18 CHAPTER 3. METHODS

Table 3.1: The variables and constants used in the multi-error function.

Varible Definition P The ground-truth path. Described by x and y coordinates. Pˆ The estimated path. Described by x and y coordinates. M The ground-truth x and y positions of the landmarks. Mˆ The estimated x and y positions of the landmarks. The number of ground-truth landmarks that have a corresponding estimated q landmark within the cut-off boundary, also referred to as the number of pairings. Msub The ground-truth paired landmarks. Mˆ sub The estimated paired landmarks. User Constants Definition wi The weight of each error. c The cut-off boundary for landmark pairing. d The maximum distance a landmark can be detected and added into the map.

ˆ ˆ T otal = E(P, P,M, M) = w1ep1 + w2ep2 + w3emap + w4efn + w5efp + w6ed (3.1)

An example of all errors except ep2 and ed are given in Figure 3.1. The figure shows an example map with ground truth landmarks (red dots), esti- mated landmarks (green dots), ground-truth path (blue line) and the estimated path (black line). The blue circles around each ground truth landmark is the pairing distance. The pairing distance is the maximum distance an estimated landmark can be from a ground-truth landmark for it to be seen as the same landmark. If an estimated landmark is not inside any pairing boundaries it will be seen as a false negative. The difference between the ground-truth and the pose is the pose error. The map error is the distance between a landmark and an estimated landmark that is located within the blue circle boundary (i.e. the pairing distance). In the figure, two of the circles have an estimated land- mark within the boundary, they will both count towards the map error. The blue circle that contains zero estimated landmarks will count towards the false negative error. The estimated landmark that is not within any of the three blue circles will count towards the false positive error.

Both the pose error (ep1 ) and maximum pose error (ep2 ) are common ways of measuring and evaluating SLAM quality. To the best of our knowledge, the map error (emap), false negative error (efn) and false positive error(efp) have not been evaluated in this way in SLAM before. The distance error (ed), and CHAPTER 3. METHODS 19

combining several errors into a single weighted function, has to the best of our knowledge not been done before and is completely novel.

Figure 3.1: An example map with ground-truth landmarks (red dots), esti- mated landmarks (green dots), ground-truth path (blue line) and the estimated path (black line). The blue circles around each ground-truth landmark is the pairing distance, an estimated landmark must be within this not to count as a false positive.

3.1.1 Pose Error The pose error, is the mean difference between the estimated path and the ground-truth path. It is a discrete function over all positions the SLAM algo- rithm estimated and the ground-truth path and it is defined in Equation 3.2:

n 1 X q e (P, Pˆ) = (px − pˆx)2 + (py − pˆy)2) p1 n i i i i i=1 (3.2)  x x x  x x x p1 p2 . . . pn ˆ pˆ1 pˆ2 ... pˆn P = y y y P = y y y p1 p2 . . . pn pˆ1 pˆ2 ... pˆn P is the ground-truth path and Pˆ is the estimated path. Depending of the dataset and SLAM algorithm used, the length of P and Pˆ will not be equal and needs to be handled appropriately. They might not be the same length, i.e. the number of datapoints in each, as the frequency of the SLAM algorithm and the system that captures the ground-truth might not be the same. Two solutions to 20 CHAPTER 3. METHODS

this is to either to interpolate (extend with estimated data points) the shorter sequence to the same length as the longer one, or remove data points in the longer sequence that does not have a corresponding data point in the shorter sequence. The pose error will only measure the position error and not the orientation error. Figure 3.2 gives an example of how the pose error can look like.

Figure 3.2: An example of pose error and maximum pose error. The pose error is the mean square difference between the ground-truth path (blue line) and estimated path (black line). The maximum pose error is where the distance between these two lines are the greatest. Applying Equation 3.2 and 3.3 gives the following errors: ep1 = 0.3359 and ep2 = 1.1557.

3.1.2 Maximum Pose Error The maximum pose error, is the largest single pose error between the estimated and ground-truth path. It is the maximum error that occurred between the estimated position and the ground-truth position during runtime. It is similar to the pose error, but instead of calculating the average error over all positions only the maximum is kept. It is defined in Equation 3.3: q ˆ x x 2 y y 2 ep2 (P, P ) = max (pi − pˆi ) + (pi − pˆi ) (3.3) i=1...n The error only includes the maximum position error and not the orientation error. Figure 3.2 gives an example of what the maximum pose error can look CHAPTER 3. METHODS 21

like.

3.1.3 Map Error The map error, is the mean square distance difference between the positions of all paired estimated landmarks and ground-truth landmarks, and the error is defined in Equation 3.4:

q 1 X q e (M , Mˆ ) = (mx − mˆ x)2 + (my − mˆ y)2 map sub sub q i i i i i=1 (3.4)  x x x  x x x m1 m2 . . . mn ˆ mˆ 1 mˆ 2 ... mˆ n M = y y y M = y y y m1 m2 . . . mn pˆ1 mˆ 2 ... mˆ n

M are the positions of ground-truth landmarks and Mˆ are the positions of the estimated landmarks (they do not have to be the same length). The pairing is made by finding the q landmark subpattern of Mˆ that is closest to M cut-off at the boundary c. c is the maximum distance from a landmark at which an estimated landmark can be paired with it. This means that each ground-truth landmark (entry in M) tries to find the closed estimated landmark (entry in Mˆ ) that is within the cut-off boundary (i.e. pairing distance). The optimal sub- pattern (shortest distance) can be found by the Hungarian method [48]. Msub ˆ and Msub are the locations of the closest paired estimated and ground-truth landmarks. Figure 3.3 gives an example of how the map error could look like. The landmark in the lower right corner has no estimated landmark within its pairing boundary and will therefore not count towards the map error, it will in- stead count towards false negative error. The estimated landmark in the lower right corner has not been paired with any landmarks and will thereby count towards the false positive error. There are two estimated landmarks within the boundary of the landmark in the upper right corner, but only the closest one will be paired, the other one will count towards the false positive error.

3.1.4 False Negative Error The false negative error are all landmarks that are not included in both M and Msub (i.e. ground-truth landmarks that were not paired with any estimated landmarks in the previous step). The error is defined in Equation 3.5:

efn(Mn, q) = |Mn − q| (3.5) 22 CHAPTER 3. METHODS

Figure 3.3: An example of how the map error can look like with the boundary set to c = 0.5. The map error is the mean square distance between the each paired ground truth landmark (red dot) and estimated landmark (green dot) and the map error in this figure is emap = 0.2688. The circle around each ground truth landmark is the maximum distance a pairing can occur at. The estimated landmark in the bottom right of the figure is a false positive, as it is not inside a pairing boundary. The ground truth landmark in the bottom right corner is a false negative, as there are no estimated landmarks within its boundary.

Mn is the number of ground-truth landmarks. Figure 3.3 gives an example of what the false negative error could look like. There are a total of four land- marks and three pairings which will yield the error, |Mn − q| = |4 − 3| = 1. Compared to the three previous errors, pose, max pose and map error this type of error does not have an inherit distance tied to it. Hence, it is important to set its respective weight to a value that is consistent with the other errors. We think a good starting value is setting it equal to the boundary value c used in setting the pairings. This is a good value because it caps each false negative error at the same value as the maximum value for each individual map error (emap). If this error’s weight was set to a larger value a small difference in an estimated landmark’s position, moving it outside or inside the boundary c, might cause a disproportionate jump in the Total error. CHAPTER 3. METHODS 23

3.1.5 False Positive Error

The false positive error are all landmarks that are not included in both Mˆ ˆ and Msub (i.e. estimated landmarks that were not paired with a ground-truth landmark in the previous step). The error is defined in Equation 3.6:

ˆ ˆ efp(Mn, q) = |Mn − q| (3.6) ˆ Mn is the number of estimated landmarks. Figure 3.3 gives an example of what the false positive error could look like. There are a total of five estimated landmarks and three pairings which will yield the error, |Mˆ −q| = |5−3| = 2. Compared to first three errors, pose, max pose and map error this type of error does not have an inherit distance tied to it. The reasoning is exactly the same as the False Negative Error, see it for further explanation.

3.1.6 Distance Error The distance error is a measurement of how close the agent was to its maximum detection range the moment a landmark was added to the map during runtime. The error is defined in Equation 3.7:

q0 1 X e (P,ˆ Mˆ ) = g(l ) d sub q0 i i=1 ( d − li, if d > li (3.7) g(li) 0, otherwise q x x 2 y y 2 li = (ˆpi − mˆ i ) + (ˆpi − mˆ i )

pˆi is the position of the agent at the time the correct landmark is added to the map. d is a parameter that represents the maximum distance it is possible for a landmark to be detected and added into the map. Hence, the user needs to calculate this value based on e.g. the sensors used. Figure 3.4 gives an example of what the distance error could look like. The difference of length between the maximum distance d (dotted black line) and the distance of detection (dotted red line) will be the error for each landmark (each paired landmark) added into the map. In this example the maximum distance is visualized as a circle as it represents a sensor in the form of a 360◦ 2-D Lidar, hence it can see in all directions. But if the sensor is a 180◦ 2-D Lidar or a camera, it should not be a circle, but a cone as it better represents the sensor used. However, landmarks that are added into the map before the car agent starts moving are not counted 24 CHAPTER 3. METHODS

towards this error. As the agent can start next to landmarks that are within its maximum detection range.

Figure 3.4: An example of how the distance error can look like with a maxi- mum detection range of d = 3. The large black dot is the agent and the black circle surrounding it is the maximum distance a landmark can be detected. The dotted red line is the distance from the agent to the landmark currently being detected. The error is 0.8664 as this is the distance from the maximum distance d to the landmark being detected inside this range.

3.1.7 Measurement Guidelines The intention of this error measurement is to provide a more in-depth evalua- tion of landmark-based SLAM than simply the pose error and thereby provide a better measurement of quality. One can both look at the Total error or each one individually depending on what the user thinks is more significant. The function contains 2 parameters and 6 weights, one for each sub-error. With these the user can tailor the error measurement towards what they think is more important. By setting different values for the parameter c, the pairing distance will shift and the limit of whether an estimated landmark will be seen as a false positive or not will change. Thereby setting a low c value will penalize a SLAM algorithm that does not estimate landmarks close to their ground-truth position. By settings different values for the weight the user can penalize the error differently. The parameter d can be difficult to set, as it depends on the feature extraction step (which happens "before" the SLAM step) and thereby CHAPTER 3. METHODS 25

depends on the sensors of the agent. The map is landmark based and each landmark is only described by two coordinates and contains no information about what landmark looks like. Hence, it is assumed that the central position is an adequate description of the landmark; landmarks of simple geometrical shapes, such as cones, pillars and or cars.

Two examples are given on how d can be established. In the paper [49] written by S. When et al. they used camera recognition and a laser scanner together with EKF-SLAM in autonomous navigation of a . Camera recognition was used to detect landmarks and the scanner was then used to detect the distance to the object. Hence, the distance d would then be based on the maximum distance that the camera could recognize landmarks. The specific distance could either be calculated through experiments or by analysing the image recognition algorithm. If the Lidar given in Table 3.2 would be used for feature extraction and detecting potential landmarks, a value for d can be directly calculated based on the geometry of the landmark and the number "laser hits" needed for detection. An example of this calculation is given in appendix A.

3.2 Experimental Setup

Because no datasets appeared to exist that could be used to properly evaluate the multi-error function, new datasets were constructed. This section describes how these datasets were created, as well as what SLAM algorithms were used during testing.

3.2.1 Research Concept Car The mobile platform with all sensor needed to perform SLAM was a small research concept car. A picture of the car used is given in Figure 3.5 and its main hardware is given in Table 3.2. The car was constructed at ÅF by the thesis students Jonathan Cressell and Isac Tornberg last year (2018) and it is described in their report [50]. 26 CHAPTER 3. METHODS

Table 3.2: The cars hardware specification.

Type Specification RC Car Himoto Tanto BL 1:10 Embedded system Nvidia Jetson TX2 Dev kit Stereo markercamera Stereolabs ZED Lidar RPLidar A2M8 IMU Sparkfun MPU-9250 9-DoF Motor driver vESC 4.12

The car is a radio controlled car with additional sensors that can be used for SLAM or other applications. The Jetson TX2 development board handles all the communication with the different sensors and motors. Ubuntu is installed on the board and Robotic Operating System (ROS) is used for writing all the software running on the car. The car is equipped with a brushless DC motor which is controlled by the Vedder Electronic Speed Controller (vESC). The vESC is also used to control the steering and it provides odometry by measur- ing the back electro magnetic force (EMF). Movement and odometry is also provided by the IMU. Visual data (stereo images) is provided by ZED camera and the Lidar provides 360◦ 2-D laser scans.

Figure 3.5: The car used in last years thesis and will be used in this years thesis [50] and the marker used for localization [51]. Markers were placed on the ground where the cones were going to be placed. It was also placed on top of the car to be able to track it.

3.2.2 Ground-Truth Capturing System The ground-truth, both for the position of the landmarks the trajectory of the car was captured with the help of the WhyCon tracking system created by CHAPTER 3. METHODS 27

T. Krajnik et al. [52]. The localization system is based of a efficient algorithm for circular black and white marker pattern detection and Figure 3.5 is an ex- ample of what the marker looks like. The markers are tracked with a camera and because the size of the marker is known, the image coordinates can be translated into real-world coordinates. The landmarks and car is tracked in two steps. First the map is created by placing markers within the cameras view and saving their locations. Cones are than placed on top of the markers, as they serve as landmarks for the SLAM algorithms. The car is then tracked as it drives around the cones within the cameras view-space. The accuracy of the ground truth capturing system was estimated to be around 0.01 m by measuring the real world distance between markers and comparing it to the distance provided by the ground-truth capturing system.

3.2.3 Datasets Created The datasets created include three different maps: a turn, an s-curve and a loop. Figure 3.6 shows the three different maps from the location of the cam- era that was used to create the ground truth. All maps are fitted inside 4x4 m square and the color of the cones does not matter.

To record the datasets the car was driven manually between the cones. For the s-curve and turn map the car was driven from the starting point (bottom- left) to the finish line (top-right) and for the loop map the car was driven one lap around the center. Each map was driven with 3 different power inputs (increased engine power) for the car (the increased engine power did not cause the car to drive much faster, but caused it to drift during turns). The same map and power input was repeated 3 times. Hence, 9 datasets were created for each map and a total of twenty-seven datasets were created in total. Each dataset is approximately 6-11 s long. The following relevant parameters are recorded in each dataset:

• Lidar Scans: The angle and distance to objects that the Lidar detects.

• Odometry from libviso2: Visual odometry provided by libviso2, can be used in the predict step of SLAM [53].

• Odometry from the vesc: Odometry provided by the motor driver, see Table 3.2.

• Odometry from the ZED: Visual odometry provided by a built in func- tion in the ZED camera, see Table 3.2. 28 CHAPTER 3. METHODS

• IMU measurements: Acceleration and angular forces on the car.

• Ground-truth positions of the cones: The x-y coordinates of all the cones.

• Ground-truth positions of the car: The x-y coordinates of all the car’s positions.

There are approximately 10-20 data points per second per parameter.

Figure 3.6: The three maps that were used to create the datasets: loop (top- left), s-curve (top-right) and turn (bottom-left). The difference in color of the cones does not matter and the cardboard squares covers markers that were used in setting up the coordinate system and they are not used.

3.2.4 SLAM algorithms Used During Testing The two algorithms used during testing are EKF-SLAM and FastSLAM and they were implemented as described by Thrun in [22] [30]. Both algorithms use the same type of feature extraction, transition model and observation model. CHAPTER 3. METHODS 29

The Lidar was used for feature extraction and it scans at approximately 15 Hz. Each scan was filtered so only points between 0.3-2.0 m from the Lidar was kept as to limit noise. From each scan a clustering extraction is performed, if there are between 4-10 point within a radius of 0.3 m that cluster center is extracted as a potential landmark.

The transition model used is an odometry motion model and is given in Equations 3.8 and 3.9 and is further explained in [54].

p 0 2 0 2 δtrans = (¯x − x¯) + (¯y − y¯) 0 0 ¯ δrot1 = atan2(¯y − y,¯ x¯ − x¯) − θ (3.8) ¯0 ¯ δrot2 = θ − θ − δrot1

0 x = x + δtrans cos(θ + δrot1) 0 y = y + δtrans sin(θ + δrot1) (3.9) 0 θ = θ + δrot1 + δrot2

The symbols (x,¯ y,¯ θ¯) and (x¯0, y¯0, θ¯0) are the current and previous coordi- nates provided by the odometry. The odometry motion model is a three step process. δrot1 is the initial rotation of the agent, the agent is then moved δtrans and after that the final rotation δrot2 is applied. Meaning at each timestep the motion model predicts the new location of the agent by using these three steps based on data provided by the odometry. The Stereolabs ZED camera is used to capture images that is used by a visual odometry algorithm called "libviso2" [53] to calculate these values.

The observation model used is a landmark measurement model and is given in Equation 3.10 and is further explained in [55].

q ri = (m − x)2 + (m − y)2 t j,x j,y (3.10) φ = atan2(mj,y − y, mj,x − x) − θ

i It calculates the estimated angle (φ) and range (rt) to a known landmark based of the pose of the car and the coordinates of the landmark. 30 CHAPTER 3. METHODS

EKF-SLAM In EKF-SLAM the accuracy (i.e. noise) of the motion model and observation model are modeled with error covariance matrices. The covariance matrices are given in 3.11:

0.01 0 0  0.04 0 R = 0 0.01 0 Q = (3.11)   0 1 0 0 0.5 The R matrix is the covariance associated with the transition model and the Q matrix is the covariance associated with observation model. The values 0.01 in the R matrix means that there is an uncertainty of 0.01 m in the x-y position and the value 0.5 means that there is an uncertainty of 0.5 deg in the angle of the transition model. These values were estimated by experimenta- tion. The value 0.04 in the Q matrix means that there in an uncertainty of i 0.04 m in the range (rt) and the value 1 means that there is an uncertainty of 1 deg in the angle (φ) in the observation model. These values were estimated by looking at the specification of the Lidar used (see Table 3.2) and through experimentation. The threshold used to decide whether a measurement should be seen as a new or old landmark was set to 5.991. This value was estimated through experimentation. In-depth information on the EKF-SLAM algorithm can be found in [22].

FastSLAM 1.0 In FastSLAM 1.0 the accuracy (i.e. noise) of the motion model and obser- vation model are modeled with error covariance matrices. The covariance matrices are given in 3.12:

0.008 0 0  0.04 0 R = 0 0.008 0 Q = (3.12)   0 1 0 0 0.35 The R matrix is the covariance associated with the transition model and the Q matrix is the covariance associated with observation model. The values 0.008 in the R matrix means that there is an uncertainty of 0.008 m in the x-y position and the value 0.35 means that there is an uncertainty of 0.35 deg in the angle of the transition model. These values were estimated by experimen- tation. The value 0.04 in the Q matrix means that there in an uncertainty of i 0.04 m in the range (rt) and the value 1 means that there is an uncertainty of 1 deg in the angle (φ) in the observation model. These values were estimated CHAPTER 3. METHODS 31

by looking at the specification of the Lidar used (see Table 3.2) and through experimentation. In FastSLAM the values in the R matrix are used to cre- ate the proposal distribution from which it draws particles, this is the reason why there is a slight difference in the R matrix between the two algorithms. This proposal distribution is created by augmenting Equation 3.9 by adding zero white mean Gaussian noise multiplied by its respective R values to each updated parameter. The number of particles used were 2000 and they were re-sampled with systematic re-sampling.

In EKF-SLAM landmarks are added permanently into the map, in Fast- SLAM they are not. In FastSLAM if a landmark is not seen over several it- erations even though it is supposedly within range, it will be removed from the map. This range is called perceptual range and was set to 1.25 m. The threshold used to decide whether a measurement should be seen as a new or old landmark was set to 1. This value was estimated through experimentation. The estimated pose was an average over all particle poses and the map corre- sponded to the particle with highest the weight. In-depth information on the FastSLAM 1.0 algorithm can be found at [30]. Chapter 4

Results & Discussion

This chapter will present and discuss the results of applying the multi-error function on the output of both EKF-SLAM and FastSLAM for all datasets. Section 4.1 discusses the choice of weights and parameters in the multi-error function. Section 4.2 presents and discusses the results of the turn datasets. Section 4.3 presents and discusses the results of the s-curve datasets. Sec- tion 4.4 presents and discusses the results of the loop datasets. Section 4.5 discusses all results together.

4.1 Weights and Parameters Used in the Error Function

The weights and parameters used in all results are presented in Table 4.1. The parameter d (the maximum distance) was set to 1.5 m based on the resolution and number of datapoints provided by the Lidar, as well as the type of feature extraction that was used to extract potential landmarks from the Lidar data. Appendix A presents an example of how the distance d can be calculated. The parameter c (the pairing distance) was set to 0.3 m as this was the approxi- mate distance between two cone centres if they were placed next to each other.

The pose (ep1 ), maximum pose (ep2 ), map error (emap) and distance error (ed) weights were all set to 1. The false negative (efn) and false positive (efp) error weights was set to 0.3, as this is the cut-off boundary.

32 CHAPTER 4. RESULTS & DISCUSSION 33

Table 4.1: The parameters and weights used in the error function.

Weights Value Description

w1 1 Pose error w2 1 Maximum pose error w3 1 Map error w4 0.3 False negative error w5 0.3 False positive error w6 1 Distance error Parameters Value Description c 0.3 The cut-off boundary for landmark pairing. d 1.5 The maximum distance.

4.2 Turn Datasets

Each dataset (twenty-seven in total, 9 for each map) has had EKF-SLAM and FastSLAM applied once to them respectively. Figure 4.1 shows the results of EKF-SLAM and FastSLAM on the dataset "Low-1" (Low power, first run). All datasets with the same name and different number means that the same power settings was used for different trails. Hence, "Low-1" and "Low-2" are different trails with the same power setting. The figure also shows the ground-truth map and path, as well as the odometry recorded in the dataset. Each square in the figure represents 1x1 m in real world coordinates. The FastSLAM pose is the average over all particles at each time-step. Table 4.2 contains each individual error and the total error for EKF-SLAM (explained in section 3.1) and Table 4.3 contains the same for FastSLAM. Something in the recording of dataset "Low-3" went wrong, but its entry is still there to show that 3 datasets were recored for each map and power setting. The name for each dataset represents the input power to the engine and the number is for differentiating between different datasets for the same map and power level.

The symbols represents the following errors: ep1 is the pose error, ep2 is the maximum pose error, emap is the map error, efn is the false negative error, efp is the false positive error, edist is the distance error and T otal is the weighted sum of all errors. Each individual error has been multiplied by its respective weight presented in Table 4.1. 34 CHAPTER 4. RESULTS & DISCUSSION

Figure 4.1: The results of running EKF-SLAM (left) and FastSLAM (right) on the turn dataset "Low-1". The lines are the following: Green (ground-truth path), black (SLAM path) and red (odometry path). The circles are the fol- lowing: Green (ground-truth landmarks) and black (SLAM landmarks). The black and red arrows are the final SLAM estimated and odometry angles. The blue dots are the particles.

Table 4.2: The individual errors and the total error for all turn datasets for EKF-SLAM.

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.117 0.260 0.084 0.6 0.6 0.025 1.720 Low-2 0.136 0.295 0.093 0.6 0.6 0.036 1.804 Low-3 ------Medium-1 0.112 0.301 0.086 0.3 0.3 0.038 1.165 Medium-2 0.057 0.233 0.061 0.0 0.0 0.065 0.414 Medium-3 0.099 0.364 0.094 0.3 0.3 0.064 1.228 High-1 0.410 1.252 0.093 3.0 2.7 0.000 7.774 High-2 0.413 1.047 0.132 2.7 2.7 0.061 7.336 High-3 0.217 0.609 0.153 1.8 1.8 0.079 4.724 CHAPTER 4. RESULTS & DISCUSSION 35

Table 4.3: The individual errors and the total error for all turn datasets for FastSLAM.

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.171 0.512 0.098 1.5 2.1 0.550 5.006 Low-2 0.175 0.551 0.105 1.5 1.2 1.040 4.641 Low-3 ------Medium-1 0.224 0.662 0.066 2.3 3.0 0.424 6.936 Medium-2 0.245 0.700 0.114 1.8 2.7 0.768 6.461 Medium-3 0.196 0.595 0.059 1.8 4.8 0.000 7.588 High-1 0.326 0.870 0.081 1.2 5.1 0.485 8.309 High-2 0.161 0.463 0.125 1.2 1.8 0.166 3.952 High-3 0.196 0.580 0.104 1.2 6.9 0.250 9.324

4.2.1 Turn Dataset Result Discussion Figure 4.2 shows the mean (column-wise) and standard deviation (std) for the EKF-SLAM and FastSLAM data presented in Table 4.2 and 4.3. Table 4.4 shows the proportion of the mean for each error (fastSLAM divided by EKF- SLAM). As described in section 3.5.1, ep1 is the mean square error between the ground-truth pose (only position) and and estimated pose. The pose error is the most common metrics for evaluating SLAM and will be used as a baseline to compare the other errors to.

Figure 4.2: The mean and std over all turn datasets for all errors for both EKF- SLAM and FastSLAM. 36 CHAPTER 4. RESULTS & DISCUSSION

Considering the results presented in Figure 4.2 and Table 4.4, it appears that just looking at the pose error might give a false impression that the two algorithms are close to equal in performance. Because the pose error is only 8.5% greater for FastSLAM than EKF-SLAM. However, by looking at the other errors it is revealed that that this might not be the case. The false negative error is 34% greater, the false positive error is 206% greater, the distance error is 900% greater and the total error (the multi-error function) is 99% greater for FastSLAM.

Table 4.4: The proportion of the mean of each error type (FastSLAM divided by EKF-SLAM).

ep1 ep2 emap efn efp edist T otal Proportion 1.085 1.131 0.944 1.344 3.066 10.00 1.995

Figure 4.1 shows a visual example of how EKF-SLAM and FastSLAM per- form on the same dataset. EKF-SLAM performs very well in the beginning; each ground-truth landmark has a corresponding estimated landmark (but two of them are not within pairing distance because of drift), but both the pose and map start to drift after the turn. FastSLAM performs well at the start and its final estimated position is much closer to the ground-truth position than EKF- SLAM. FastSLAM’s pose error is just 8.5% greater for this dataset, but the total error is 95% greater. Studying the figure gives an implication as to why. Even though there is no noticeable drift in pose, the landmarks on the right side have drifted. However, as all the estimated landmarks are on the correct "line" they don’t impact the pose greatly. Hence, by using the T otal error instead of just pose error it might give a better idea of the performance differ- ence between two algorithms as it captures errors that might not be properly exhibited solely in the pose error.

4.3 S-curve Datasets

Each dataset (twenty-seven in total, 9 for each map) has had EKF-SLAM and FastSLAM applied once to them respectively. Figure 4.3 shows the results of EKF-SLAM and FastSLAM on the data "Low-1" (Low power, first run). All datasets with the same name and different number means that the same power settings was used for different trails. Hence, "Low-1" and "Low-2" are differ- ent trails with the same power setting. The figure also shows the ground-truth CHAPTER 4. RESULTS & DISCUSSION 37

map and path, as well as the odometry recorded in the dataset. Each square in the figure represent 1x1 m in real world coordinates. Table 4.5 contains each individual error and the Total error (explained in section 3.1) for EKF-SLAM and Table 4.6 contains the same for FastSLAM.

Figure 4.3: The results of running EKF-SLAM (left) and FastSLAM (right) on the s-curve dataset "Low-1". The lines are the following: Green (ground-truth path), black (SLAM path) and red (odometry path). The circles are the fol- lowing: Green (ground-truth landmarks) and black (SLAM landmarks). The black and red arrows are the final SLAM estimated and odometry angles. The blue dots are the particles.

Table 4.5: The individual errors and the total error for all s-curve datasets for EKF-SLAM.

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.120 0.288 0.106 0.9 0.6 0.050 2.080 Low-2 0.237 0.609 0.086 2.4 2.4 0.088 5.974 Low-3 0.222 0.582 0.105 2.4 2.4 0.086 5.914 Medium-1 0.229 0.537 0.060 2.7 2.4 0.021 6.118 Medium-2 0.079 0.198 0.110 1.2 0.3 0.078 1.936 Medium-3 0.086 0.299 0.132 0.0 0.0 0.068 0.539 High-1 0.274 0.905 0.076 2.1 1.8 0.000 5.354 High-2 0.099 0.281 0.094 0.6 0.3 0.072 1.452 High-3 0.257 0.717 0.130 1.8 2.1 0.166 5.298 38 CHAPTER 4. RESULTS & DISCUSSION

Table 4.6: The individual errors and the total error for all s-curve datasets for FastSLAM

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.299 0.686 0.085 3.3 2.7 0.206 7.492 Low-2 0.195 0.517 0.103 1.8 0.3 1.078 4.087 Low-3 0.236 0.595 0.101 3.0 1.2 0.757 6.026 Medium-1 0.295 0.690 0.142 2.4 4.5 0.023 8.205 Medium-2 0.194 0.696 0.040 3.6 3.3 0.000 7.984 Medium-3 0.212 0.614 0.105 2.7 2.7 0.829 7.268 High-1 0.265 0.691 0.128 3.0 2.1 0.046 6.368 High-2 0.253 0.637 0.151 2.1 4.5 0.546 8.290 High-3 0.291 0.751 0.076 3.6 2.4 0.000 7.334

4.3.1 S-curve Dataset Result Discussion Figure 4.4 shows the mean (column-wise) and std for the EKF-SLAM and FastSLAM data presented in Table 4.5 and 4.6. Table 4.7 shows the proportion of the mean for each error.

Figure 4.4: The mean and std over all s-curve datasets for all errors for both EKF-SLAM and FastSLAM.

Considering the results shown in Figure 4.4 and Table 4.7, it shows that FastSLAM’s pose error is 39.7% greater and its maximum pose error is 33% CHAPTER 4. RESULTS & DISCUSSION 39

greater than EKF-SLAM. Compared to the turn dataset, the pose and maxi- mum pose error is much greater. However, the pose error might still not ac- curately show the performance difference between the two algorithms, as the false negative error is 80.8% greater, the false positive error is 92.6% greater, the distance error is 454% greater and the total error is 81.9% greater. So even though its pose error is 39.7% greater, the FastSLAM algorithm might actu- ally perform even worse than that. By incorporating the other errors, the true nature of the SLAM performance might be better understood.

Table 4.7: The proportion of the mean of each error type (FastSLAM divided by EKF-SLAM).

ep1 ep2 emap efn efp edist T otal Proportion 1.397 1.330 1.035 1.808 1.926 5.540 1.819

Figure 4.3 shows a visual example of how EKF-SLAM and FastSLAM per- form on the same s-curve dataset and these figures appear to support the pre- vious statement. They show (and the data) that FastSLAM performs slightly worse than EKF-SLAM when looking at the pose, however its map is consid- erably worse. Several landmarks does not appear in the final map and three landmarks have two estimated landmarks counterparts. By looking at the fig- ure it appears that FastSLAM would be close to unusable, but by just looking at the pose error this might not sufficiently show.

4.4 Loop Datasets

Each dataset (twenty-seven in total, 9 for each map) has had EKF-SLAM and FastSLAM applied once to them respectively. Figure 4.5 shows the results of EKF-SLAM and FastSLAM on the data "Low-1" (Low power, first run). All datasets with the same name and different number means that the same power settings was used for different trails. Hence, "Low-1" and "Low-2" are differ- ent trails with the same power setting. The figure also shows the ground-truth map and path, as well as the odometry recorded in the dataset. Each square in the figure represent 1x1 m in real world coordinates. Table 4.8 contains each individual error and the Total error (explained in section 3.1) for EKF-SLAM and Table 4.9 contains the same for FastSLAM. 40 CHAPTER 4. RESULTS & DISCUSSION

Figure 4.5: The results of running EKF-SLAM (left) and FastSLAM (right) on the loop dataset "Low-1". The lines are the following: Green (ground-truth path), black (SLAM path) and red (odometry path). The circles are the fol- lowing: Green (ground-truth landmarks) and black (SLAM landmarks). The black and red arrows are the final SLAM estimated and odometry angles. The blue dots are the particles.

Table 4.8: The individual errors and the total error for all loop datasets for EKF-SLAM.

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.121 0.400 0.154 0.0 0.0 0.084 0.728 Low-2 0.149 0.338 0.150 0.0 0.0 0.048 0.686 Low-3 0.228 0.472 0.162 2.4 2.4 0.070 5.800 Medium-1 0.204 0.667 0.140 2.1 2.1 0.077 5.355 Medium-2 0.156 0.413 0.127 0.0 0.0 0.110 0.836 Medium-3 0.132 0.305 0.132 0.0 0.0 0.059 0.631 High-1 0.175 0.417 0.182 0.0 0.3 0.082 1.150 High-2 0.210 0.431 0.175 1.5 1.5 0.083 3.936 High-3 0.163 0.391 0.157 0.6 0.6 0.035 1.953 CHAPTER 4. RESULTS & DISCUSSION 41

Table 4.9: The individual errors and the total error for all loop datasets for FastSLAM.

Dataset ep1 ep2 emap efn efp edist T otal Low-1 0.225 0.485 0.170 2.7 2.1 1.098 6.835 Low-2 0.228 0.552 0.066 3.3 2.4 0.312 7.022 Low-3 0.233 0.496 0.135 1.5 2.1 0.443 5.007 Medium-1 0.364 0.760 0.212 3.6 0.9 0.892 6.881 Medium-2 0.242 0.775 0.135 2.7 2.1 0.102 5.262 Medium-3 0.236 0.516 0.119 2.7 2.7 0.550 6.940 High-1 0.341 0.759 0.050 3.6 2.4 0.000 7.442 High-2 0.250 0.629 0.144 2.7 2.7 0.556 7.076 High-3 0.294 0.642 0.146 2.1 0.9 0.850 5.081

4.4.1 Loop Dataset Result Discussion Figure 4.6 shows the mean (column-wise) and std for the EKF-SLAM and FastSLAM data presented in Table 4.8 and 4.9. Table 4.10 shows the propor- tion of the mean for each error.

Figure 4.6: The mean and std over all loop datasets for all errors for both EKF- SLAM and FastSLAM.

Considering the results shown in Figure 4.6 and Table 4.10, they show that FastSLAM’s pose error is 56.8% greater and maximum pose error is 46.4% greater than for EKF-SLAM. Compared to the turn and s-curve dataset the 42 CHAPTER 4. RESULTS & DISCUSSION

pose and maximum pose error percentage is greater. However, even though they are greater they still might not provide an accurate depiction if how much worse the performance of FastSLAM is than EKF-SLAM for this dataset. The false negative error is 277% greater, the false positive error is 165.2% greater, the distance error is 651% greater and the total error is 273% greater.

Table 4.10: The proportion of the mean of each error type (FastSLAM divided by EKF-SLAM).

ep1 ep2 emap efn efp edist T otal Proportion 1.568 1.464 0.853 3.772 2.652 7.412 2.730

Figure 4.5 shows a visual example of how EKF-SLAM and FastSLAM perform on the same loop dataset. If one would just study the pose error, both in the figure and its numerical value, the algorithms performance difference might not appear that great. However, by looking at the final map in the fig- ure, it shows that EKF-SLAM performs much better than FastSLAM. Because FastSLAM has a much larger amount of false positive and false negative errors.

4.5 All Results & Multi-Error Function Dis- cussion

The objective of this thesis was to create a multi-error function that incorpo- rated several different type of errors into a single formula that could be used for establishing SLAM quality. We used EKF-SLAM and FastSLAM on sev- eral datasets to gain insights into how they impacted the multi-error function and how the different sub-errors were expressed in the two algorithms.

The proportion of the mean (FastSLAM divided by EKF-SLAM) of the map error (emap) for all three datasets are: 0.944, 1.035 and 0.853. Meaning, FastSLAM performs better or close to equal in this error type, even though it performs worse to significantly worse in the other sub-error types. As de- scribed in section 3.5.3, the map error is the mean difference of all paired landmarks. For this error measurement to have any significance a large por- tion of all ground-truth landmarks must be paired, otherwise very few of all landmarks will count into this error and cause the error to be deceptive. This is why the FastSLAM algorithm performs well in this sub-error because, often, very few of the estimated landmarks are within pairing distance. So very few of the estimated landmarks are accounted for in the map error for FastSLAM. CHAPTER 4. RESULTS & DISCUSSION 43

Also, the size of each individual map error is capped at the pairing distance.

The proportion of the mean (FastSLAM divided by EKF-SLAM) for the distance error (edist) for all three datasets are: 10.0, 5.540 and 7.412. The in- terpretation of these results are that FastSLAM detect objects much later than EKF-SLAM, even though both algorithms use the same type of sensors and feature extraction. This is probably because FastSLAM uses a counter for each landmark while EKF-SLAM does not. The counter is decreased for a land- mark if it was not seen when it should have been. What is happening is that landmarks are continuously being added and deleted from the map because of poor feature extraction. The distance error is thereby increased, as it is only the final map that is used in its calculation. There are a couple of datasets that have a distance error equal to zero ("s-curve High-1" and "s-curve Medium- 2"). This does not mean that all landmarks were seen at or over the maximum detection range, it is more of an indication that the SLAM algorithm performed poorly. The same as the map error, only a paired landmark’s corresponding estimated landmark is counted in the distance error.

The hypothesis stated in the beginning of this thesis is "The choice of how the quality of SLAM is evaluated impacts which algorithms appears the most suitable for a given scenario.". This hypothesis is evaluated by comparing the pose error (ep1, the most common way of evaluating SLAM quality) to the multi-error function’s Total error for all datasets between EKF-SLAM and FastSLAM . In all datasets except for 3 ("turn High-1", "turn High-3" and "s- curve High-1") the algorithm with the lowest pose error also had the lowest Total error. This is an indication that it might be unnecessary to use a more sophisticated method to calculate the error when deciding between two algo- rithms. In figure 4.7 the pose error and Total error are plotted for all datasets for both SLAM algorithms. It shows how heavily the two type of errors are correlated. This is another indication that using a more sophisticated method might be unnecessary. The solid blue (pose error) and solid orange (Total error) lines show the errors of EKF-SLAM. The two lines show obvious cor- relation. The same hold true for FastSLAM (the striped blue and orange lines).

This is not a strange result to reach. As all the sub-errors (except the pose errors) indirectly impacts the pose error. The false negative error, the false pos- itive error, the map error and distance error all impact the pose error in some fashion: Missed landmarks provides the algorithm less information used in correcting its pose, false landmarks will provide incorrect information to the 44 CHAPTER 4. RESULTS & DISCUSSION

Figure 4.7: The pose (ep1) and total error for EKF-SLAM and FastSLAM for all datasets. algorithm which might cause it to incorrectly move its pose, estimated land- marks closer to the ground-truth landmark positions will provide a more ac- curate pose and the earlier landmarks are added into the map the earlier they can be used to estimate the movement of the agent.

In general, it appears that the just looking at the pose error is a good enough indicator of SLAM quality. However, there are some results that indicate the SLAM quality is not properly exhibited purely by looking at the pose error. This is especially clear from the discussion in section 4.2.1 and Table 4.4, were the mean pose error for FastSLAM is just slightly higher than for EKF-SLAM, but all other sub-error and the Total error are significantly higher. Examining a couple of datasets in detail: The results of running FastSLAM on the two datasets "turn High-2" and "turn High-3" gives the pose errors: 0.162 m and 0.196 m, a difference of just 0.034 m. However, their Total errors are: 3.952 and 9.324, the second one is more than twice as large. By studying the sub- errors, the large difference in Total error is because of the large difference in the false positive error, they are 1.8 and 6.9 respectively. There are other datasets that show similar behaviour, such as running EKF-SLAM on "loop Medium- 1" and "loop Medium-2". SLAM stands for "Simultaneous localization and mapping" and from these results it appears that even though the localization part (the pose error) shows good performance, it is not always an indication CHAPTER 4. RESULTS & DISCUSSION 45

that the mapping is working correctly. However, often the results are tied, a good mapping is needed for good localization and good localization is needed for good mapping. In the datasets recorded, the odometry performed excep- tionally well, which might be the reason why FastSLAM has such a low pose error, considering how poor its map is in many of the datasets.

The objective of the thesis was to create a multi-error function and evaluate it. This has successfully been done by creating several datasets and running two different SLAM algorithms on them and evaluating the results with the multi-error function. Even though the results show that the pose error is often a good enough indicator of SLAM quality, we have shown that it is not always the case. The Total error provided by the multi-error function is a better indi- cator than just pose error as it manages to capture the mapping process in much more detail. This is important as a SLAM algorithm is often not just used in- dependently, but applications such as path planning and obstacle avoidance in AVs use the location and map provided by the SLAM algorithm. Depending on what application the SLAM algorithm will be used for, different errors will matter differently. Chapter 5

Future Work & Conclusions

This chapter discusses potential future work and the final conclusions. Sec- tion 5.1 presents the final conclusions of the thesis and section 5.2 presents potential future work.

5.1 Conclusions

A multi-error function for establishing SLAM quality has been developed and tested in this thesis. The results indicate that in general looking at the pose er- ror is a good enough indicator for finding the superior algorithm, because the pose error is heavily correlated with many of the sub-errors in the multi-error function. However, the results also show that a low pose error can be mis- leading and does not always indicate that the algorithm is working properly. In some cases a low pose error does not necessarily mean that errors relating to the map is proportionately as low. Hence, using the developed multi-error function is superior than just looking at the pose error, as it manages to capture errors relating the the mapping process much better. But it does require that the datasets are labelled with what are supposed to be landmarks used by the algorithm.

The final conclusions of this thesis are: the pose error is often a good enough indicator of SLAM quality, however it can occasionally be mislead- ing with errors relating to the mapping. Alternatively, by using the developed multi-error function, errors relating to the mapping will be detected. This in turn allows the user to make more informed choice whether the SLAM algo- rithm is suitable for a particular application.

46 CHAPTER 5. FUTURE WORK & CONCLUSIONS 47

5.2 Future Work

The multi-error function developed in this thesis can not be used for all SLAM algorithms, only algorithms that use specific Landmarks, e.g. it can not be used for evaluating ORB-SLAM and many other visual-based algorithms that use generic features (see Appendix C) instead of specific landmarks. Further research is needed to try to find a more in-depth error than just the pose that can be used for evaluating all SLAM algorithm, or at least a greater number.

In the introduction of this thesis the question "is SLAM solved?" was pre- sented and Cadena et al. at [10] answers that this question can not be answered without specifying a robot/environment/performance combination. We think "application" can also be added into that list, as depending on the application the needs of the SLAM algorithm can vary greatly. Very often the SLAM al- gorithm is not used independently, but other systems subscribe to the pose and map it generates. The difference in expected performance and reliability of a SLAM algorithm used e.g. in an underwater autonomous submarine, an au- tonomous car or an robot vacuum cleaner is huge. Hence, we think that other ways of measuring SLAM quality that is more based on the application that it will be used in needs more research. Evaluation methods relating to the num- ber of sensors needed and their cost, the computational time, the reliability of the sensors and algorithm, how sensor failures will affect it, and any number of interesting things that are not just simply pose error. Presenting yet another SLAM algorithm where the only error measurement is its pose error, might not be that useful and that interesting any more. Bibliography

[1] L. M. Clements and K. M. Kockelman, “Economic effects of automated vehicles,” Transportation Research Record, vol. 2606, no. 1, pp. 106– 114, 2017.

[2] M. Ahman, “Volvo testar sjalvkorande bilar,” Web- site: Recharge, 09 2016, 16. [Online]. Available: https://www.mestmotor.se/recharge/artiklar/nyheter/20160916/volvo- testar-sjalvkorande-lastbilar-i-gruvan/

[3] E. Vaish, “Volvo’s self-driving car venture gets nod to test on swedish roads,” Website: Reuters, 01 2019, 28. [Online]. Available: https://www.reuters.com/article/us-volvo- autonomous/volvos-self-driving-car-venture-gets-nod-to-test-on- swedish-roads-idUSKCN1PM1J2

[4] Itf, Automated and Autonomous Driving: Regulation under Uncertainty. OECD Publishing, 2015.

[5] M. DeBord, “We tested tesla’s, cadillac’s, and mercedes’ semi-autonomous driving systems — and the winner is clear,” Website: Forbes, 07 2018, 16. [Online]. Avail- able: https://www.businessinsider.com/best-self-driving-systems-in- cars-compared-2017-12?r=USIR=Ttesla-autopilot-1

[6] F. T. Council, “Looking ahead: 11 predictions on how drone deliveries will work,” Website: Forbes, 10 2018, 04. [Online]. Available: https://www.forbes.com/sites/forbestechcouncil/2018/10/04/looking- ahead-11-predictions-on-how-drone-deliveries-will- work/39c0785251b3

[7] R. Lanctot, “Accelerating the future: The economic impact of the emerg- ing passenger economy,” 2017.

48 BIBLIOGRAPHY 49

[8] H. Durrant-Whyte and T. Bailey, “Simultaneous localization and map- ping: part i,” Robotics and Automation Magazine, IEEE, vol. 13, no. 2, pp. 99–110, 2006.

[9] T. Bailey and H. Durrant-Whyte, “Simultaneous localization and map- ping (slam): part ii,” IEEE Robotics and Automation Magazine, vol. 13, no. 3, pp. 108–117, 2006.

[10] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Past, present, and future of simultaneous lo- calization and mapping: Toward the robust-perception age,” Robotics, IEEE Transactions on, vol. 32, no. 6, pp. 1309–1332, 2016.

[11] R. Mur-Artal and J. Tardos, “Orb-slam2: An open-source slam sys- tem for monocular, stereo, and rgb-d cameras,” Ieee Transactions On Robotics, vol. 33, no. 5, pp. 1255–1262, 2017.

[12] A. Concha and J. Civera, “Dpptam: Dense piecewise planar tracking and mapping from a monocular sequence,” vol. 2015-. Institute of Electrical and Electronics Engineers Inc., 2015, pp. 5686–5693.

[13] B. Xu, Z. Liu, Y. Fu, and C. Zhang, “Research of cartographer laser slam algorithm,” vol. 10605. SPIE, 2017, pp. 1 060 509–1 060 509–9.

[14] A. Filatov, A. Filatov, K. Krinkin, B. Chen, and D. Molodan, “2d slam quality evaluation methods.” FRUCT, 2017, pp. 120–126.

[15] G. Rapier, “Self-driving cars could wipe out 4 million jobs — but a new report says the upsides will be easily worth it,” Website: MARKETS INSIDER, 06 2018, 13. [Online]. Available: https://markets.businessinsider.com/news/stocks/self- driving-cars-could-kill-4-million-jobs-economic-impact-worth-it-2018- 6-1026937775

[16] C. Reinicke, “Autonomous vehicles won’t only kill jobs. they will create them, too,” Website: CNBC, 08 2018, 11. [Online]. Available: https://www.cnbc.com/2018/08/10/autonomous-vehicles- are-creating-jobs-heres-where.html

[17] M. Bertoncello and D. Wee, “Ten ways autonomous driv- ing could redefine the automotive world,” Website: McK- insey and Company, 06 2015, 11. [Online]. Available: 50 BIBLIOGRAPHY

https://www.mckinsey.com/industries/automotive-and-assembly/our- insights/ten-ways-autonomous-driving-could-redefine-the-automotive- world

[18] L. D’Olimpio, “The trolley dilemma: would you kill one person to save five?” Website: TheConversation, 06 2016, 03. [Online]. Available: http://theconversation.com/the-trolley-dilemma-would-you- kill-one-person-to-save-five-57111

[19] A. Maxmen, “Self-driving car dilemmas reveal that moral choices are not universal,” Nature, vol. 562, no. 7728, pp. 469–470, 2018.

[20] J. Park, “Sensemaking / what will autonomous vehicles mean for sustainability?” Website: Recharge, 02 2017, 27. [Online]. Avail- able: https://thefuturescentre.org/articles/11010/what-will-autonomous- vehicles-mean-sustainability

[21] U. Government, “Gps accuarcy,” Website: Recharge, 03 2019, 16. [Online]. Available: https://www.gps.gov/systems/gps/performance/accuracy/

[22] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 10.

[23] ——, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 9.

[24] J. Neira and J. Tardos, “Data association in stochastic mapping using the joint compatibility test,” IEEE Transactions on Robotics and Automa- tion, vol. 17, no. 6, pp. 890–897, 2001.

[25] P. Moutarlier and R. Chatila, An experimental system for incremental environment modelling by an autonomous mobile robot, 04 2006, pp. 327–346.

[26] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 3.

[27] ——, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 13.

[28] A. Doucet, N. de Freitas, K. Murphy, and S. Russell, “Rao-blackwellised particle filtering for dynamic bayesian networks,” 2013. BIBLIOGRAPHY 51

[29] M. Montemerlo, S. Thrun, and W. Whittaker, “Conditional particle filters for simultaneous mobile robot localization and people-tracking,” vol. 1. USA: IEEE, 2002, pp. 695–701.

[30] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 13.

[31] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J. J. Leonard, “Slam: Towards the robust-perception age,” Website: Git Hub, 07 2018, 16. [Online]. Available: https://slam- future.github.io/

[32] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1231–1237, 2013.

[33] M. Burri, J. Nikolic, P. Gohl, T. Schneider, J. Rehder, S. Omari, M. W. Achtelik, and R. Siegwart, “The euroc micro aerial vehicle datasets,” The International Journal of Robotics Research, vol. 35, no. 10, pp. 1157– 1163, 2016.

[34] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of rgb-d slam systems.” IEEE, 2012, pp. 573–580.

[35] B. Koch, R. Leblebici, A. Martell, S. Jörissen, K. Schilling, and A. Nüchter, “Evaluating continuous-time slam using a predefined tra- jectory provided by a robotic arm,” vol. 4, no. 2. Copernicus GmbH, 2017, pp. 91–98.

[36] A. Schmidt, M. Fularz, M. Kraft, A. Kasiński, and M. Nowicki, “An indoor rgb-d dataset for the evaluation of algorithms,” vol. 8192. Springer Verlag, 2013, pp. 321–329.

[37] J. Bayer, P. Čížek, and J. Faigl, “On construction of a reliable ground truth for evaluation of visual slam algorithms,” Acta Polytechnica CTU Proceedings, vol. 6, pp. 1–5, 2016. [Online]. Available: https://doaj.org/article/1787ff4172654dc1b111142559b76565

[38] D. Vivet, P. Checchin, and R. Chapuis, “Slam with slow rotating range sensors. evaluation of lined-based map quality,” in ICARCV, 2010. 52 BIBLIOGRAPHY

[39] W. Hess, D. Kohler, H. Rapp, and D. Andor, “Real-time loop closure in 2d lidar slam,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), vol. 2016-. IEEE, 2016, pp. 1271–1278.

[40] A. Huletski, D. Kartashov, and K. Krinkin, “Vinyslam: An indoor slam method for low-cost platforms based on the transferable belief model,” vol. 2017-. Institute of Electrical and Electronics Engineers Inc., 2017, pp. 6770–6776.

[41] R. Kümmerle, B. Steder, C. Dornhege, M. Ruhnke, G. Grisetti, C. Stach- niss, and A. Kleiner, “On measuring the accuracy of slam algorithms,” Autonomous Robots, vol. 27, no. 4, pp. 387–407, 2009.

[42] I. Varsadan, A. Birk, and M. Pfingsthorn, “Determining map quality through an image similarity metric,” vol. 5399, 2009, pp. 355–365.

[43] A. Birk, “Learning geometric concepts with an evolutionary algorithm,” in Evolutionary Programming, 1996.

[44] D. Schuhmacher, B.-T. Vo, and B.-N. Vo, “A consistent metric for per- formance evaluation of multi-object filters,” Signal Processing, IEEE Transactions on, vol. 56, no. 8, pp. 3447–3457, 2008.

[45] P. Barrios, M. Adams, K. Leung, F. Inostroza, G. Naqvi, and M. E. Orchard, “Metrics for evaluating feature-based mapping performance,” Robotics, IEEE Transactions on, vol. 33, no. 1, pp. 198–213, 2017.

[46] F. Amigoni, M. Reggiani, and V. Schiaffonati, “An insightful comparison between experiments in mobile robotics and in science,” Autonomous Robots, vol. 27, no. 4, p. 313, Aug 2009. [Online]. Available: https://doi.org/10.1007/s10514-009-9137-8

[47] M. d. l. I. Valls, H. F. C. Hendrikx, V. Reijgwart, F. V. Meier, I. Sa, R. Dubé, A. R. Gawel, M. Bürki, and R. Siegwart, “Design of an au- tonomous racecar: Perception, state estimation and system integration,” 2018.

[48] H. Kuhn, “The hungarian method for the assignment problem,” in 50 Years of Integer Programming 1958-2008: From the Early Years to the State-of-the-Art. Springer Berlin Heidelberg, 2010, pp. 29–47. BIBLIOGRAPHY 53

[49] S. Wen, M. Sheng, C. Ma, Z. Li, H. Lam, Y. Zhao, and J. Ma, “Camera recognition and laser detection based on ekf-slam in the autonomous nav- igation of humanoid robot,” Journal of Intelligent and Robotic Systems, vol. 92, no. 2, pp. 265–277, 2018.

[50] J. Cressell and I. Tornberg, “A combination of object recognition and localisation for an autonomous racecar,” Master Thesis, KTH, 2018.

[51] T. Krajník, M. Nitsche, J. Faigl, P. Vaněk, M. Saska, L. Přeučil, T. Duckett, and M. Mejail, “Whycon: A precise, efficient and low-cost localization system,” Website: Git Hub, 10 2018, 05. [Online]. Available: https://github.com/gestom/whycon-orig

[52] ——, “A practical multirobot localization system,” Journal of Intelligent and Robotic Systems, vol. 76, no. 3, pp. 539–562, 2014.

[53] A. Geiger, J. Ziegler, and C. Stiller, “Stereoscan: Dense 3d reconstruc- tion in real-time,” in Intelligent Vehicles Symposium (IV), 2011.

[54] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 5.

[55] ——, Probabilistic Robotics, 1st ed. The MIT Press, 2006, ch. 5.

[56] D. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2. IEEE, 1999, pp. 1150–1157 vol.2.

[57] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (surf),” Computer Vision and Image Understand- ing, vol. 110, no. 3, pp. 346 – 359, 2008, similarity Match- ing in Computer Vision and Multimedia. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1077314207001555

[58] E. Rosten and T. Drummond, “Machine learning for high-speed corner detection,” vol. 3951. Springer Verlag, 2006, pp. 430–443.

[59] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: An efficient alternative to sift or surf,” in 2011 International Conference on Computer Vision. IEEE, 2011, pp. 2564–2571.

[60] P. Nunez, R. Vazquez-Martin, J. Del Toro, A. Bandera, and F. Sandoval, “Feature extraction from laser scan data based on curvature estimation for 54 BIBLIOGRAPHY

mobile robotics,” in Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006, vol. 2006. IEEE, 2006, pp. 1167–1172.

[61] A. Akbar Aghamohammadi, H. Taghirad, A. Hossein Tamjidi, and E. Mihankhah, “Feature-based laser scan matching for accurate and high speed mobile robot localization.” 01 2007.

[62] M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartog- raphy,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. Appendix A

An example how to calculate the error distance d if the Lidar in Table 3.2 is used for feature extraction and landmark detection. It is a 2D Lidar and returns 360 measurements for each revolution and it has a resolution of 1◦. If we assume that the landmarks are cones with a diameter of l and that at least five laser points needs to hit the cone during one revolution of the Lidar for the cone to be the detected. Then the maximum distance d is calculated as in Figure A.1 and the solution is given in Equation A.1. ϕ is the angle of five Lidar points, d is the distance from the Lidar to the cones center and l is the diameter of the cone.

Figure A.1: The point to the left is the Lidar and the circle to the right is the cone.

ϕ l 1 l 1 tan( ) = ∗ ⇒ d = ∗ ϕ 2 2 d 2 tan( ) 2 (A.1) π l = 0.1 ϕ = 4 ∗ d = 1.43m 180

55 Appendix B

Sensors

To perform SLAM a range of sensors are used that have their own strengths and weaknesses. Depending on the sensor, they can be used in either the predict or update step in the algorithm or both. The following list contains some of the common sensors used in different SLAMs. • IMU: An Inertial Measurement Unit (IMU), is an electronic device that is used to measure an agent’s (or any object’s) forces. It usually consists of two devices, an accelerometer and gyroscope. Accelerometers are used to measure the acceleration in x, y, and z direction of an agent. A gyroscope is used to measure the angular velocity of an agent. By com- bining theses two sensors the IMU is able to measure the acceleration of an agent and thereby estimate its velocity. An IMU can be used for odometry, which is the use of data from motion sensors to predict the agent’s change in position over time. Therefore, an IMU can be used in the predict step of SLAM for estimating the agent’s next potential location. • Hall Effect Sensor: A Hall effect senor is used to measure the strength of a magnetic field, it outputs a voltage in direct relation to the strength of the field. They have many applications and they are often used in measuring the speed of wheels. Hence, a hall effect sensor can be used to estimate the velocity of of an agent’s wheels and together with transition model it can be used in the predict step of SLAM. • Monocular Camera: A monocular camera is a type of camera with only one entrance point (one image sensor) and produces a 2-dimensional

56 APPENDIX B. 57

image. These sensors can be used in Visual SLAM. As they lack depth information, algorithms that match features (such as corners or edges) between frames, together with the velocity of the agent can be used to estimate the depth of images and the distance between two features.

• Stereo Camera: A stereo camera is a type of camera with two entrance points (two image sensors) which allows it to simulate binocular vision and can produce 3-dimensional images. The 3-dimensional image is created by matching features found in the "left" and "right" camera to- gether real-world distance between the two camera sensors.

• RGB-D Camera: A Red-Green-Blue-Distance (RGB-D) camera is a camera that provides both color and distance of each pixel. The depth can be captured by projecting a infra-red pattern and capturing it with an infra-red camera. By comparing the the projected and captured pattern the depth of each pixel can be inferred.

• Lidar: A Lidar, or often called a laser scanner, captures the distance to targets by shooting the target with a pulsed laser light and measuring the reflected pulses. By comparing the return times and wavelengths a 2-dimensional or 3-dimensional representation of the environment can be created. Appendix C

Feature Extraction

In each update step of the SLAM algorithm all potential new features (or land- marks) are associated with the current features to estimate whether they are new features or old features. The features used by the SLAM algorithm must first be extracted from sensor readings. The two most common sensors types used for this are laser and camera based. Depending on the environment, sen- sors available and SLAM algorithm used, the features of interest change. In visual SLAM, four common feature extractors are: Scale Invariant Feature Transform (SIFT) [56], Speeded Up Robost Features (SURF) [57], Features from accelerated segment test (FAST) [58] and Oriented FAST and rotated BRIEF [59].

In SIFT, features are extracted by convolving the input image with a Gaus- sian for several different variances, taking the difference between these images and finding the local extrema. Additional calculations are applied to these ex- treme points to determine whether they are features or not. Each feature (of- ten called keypoint) is then given the following description: Location, scale, radius and orientation. SURF is based on SIFT; the steps are the same, but they are done differently. Instead of filtering with a Gaussian, it’s filtered with a window-based method that is an approximation of a Gaussian, and blob- detection is used instead of finding extrema. Each feature is described by location, scale and orientation. FAST is a very fast corner detector, a pixel is classified as a corner depending on certain criteria. ORB uses a version of FAST that also includes the orientation of a corner and each feature is de- scribed by BRIEF. Depending on the SLAM algorithm and its map represen-

58 APPENDIX C. 59

tation, it can use the features supplied by these algorithms directly, or it uses landmarks that are extracted from clusters of these features.

P. Nunez et al. [60] presents a geometrical feature detection method from 2D laser range-finder (i.e. Lidar). They are able to extract and characterize line segments, corners and curve segments. Depending on the SLAM implemen- tation, these features can be used as landmarks. A. Akbar et al. [61] extract corners and "jump-edges" from a 2D laser-range finder, both of which can be used as landmarks. Edges and corners are common features to extract from 2-D Lidars as they are common landmarks that can be found in many environ- ments. They can represent walls, houses or corners in rooms and buildings. A common algorithm for finding lines (walls) or circles is RANSAC [62]. It is an iterative method used to estimate the parameters of a mathematical model based on noisy data.

The feature extraction step does not have to be only visual or Lidar based. In [50] they used Lidar and a camera running a neural network trained to recog- nize cones. They only added a cone into the map if both the feature extraction of the Lidar and the neural net saw a cone. In [49] they used a camera and an object recognition algorithm to detect if there were any landmarks in view. If a landmark was detected with the camera it used a laser to measure the distance to the landmark.

TRITA-EECS-EX-2019:478

www.kth.se