A ROS-BASED TOY-CAR DETECT-AND-PLACE DOMESTIC

A Thesis

Presented to the

Faculty of

California State Polytechnic University, Pomona

In Partial Fulfillment

Of the Requirements for the Degree

Master of Science

In

Mechanical Engineering

By

Yifan Wang

2021

SIGNATURE PAGE

THESIS: A ROS-BASED TOY-CAR DETECT- AND-PLACE DOMESTIC ROBOT

AUTHOR: Yifan Wang

DATE SUBMITTED: Spring 2021

Department of Mechanical Engineering

Dr. Yizhe Chang Thesis Committee Chair Mechanical Engineering

Dr. Campbell A. Dinsmore Mechanical Engineering

Dr. Nolan E. Tsuchiya Mechanical Engineering

ii

ABSTRACT

Robot Operating System (ROS) is an open-source framework for with a collection of tools, libraries, and conventions that aim to simplify the task of creating complex and robust robot behavior across a wide variety of robotic platforms.

The study in the thesis encompassed the integration of LiDAR, cameras, iRobot Create2 and ROS to implement simultaneous localization and mapping (SLAM), autonomous navigation, and real-time object detection. A robot to detect a specific object – toy cars and place them to a designated goal was developed. The odometry from the and the LiDAR input were utilized for SLAM and navigation. The camera inputs were applied for object detection. The performance of the system was evaluated by the operation time, detection accuracy, and operation success rate. The result demonstrated the implementation of the intended robot system in ROS.

iii

TABLE OF CONTENTS

SIGNATURE PAGE ...... ii

ABSTRACT ...... iii

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

CHAPTER 1: INTRODUCTION ...... 1

1.1 Background ...... 1

1.2 Objectives ...... 2

1.3 Thesis Organization ...... 2

CHAPTER 2. THEORY ...... 4

2.1 ROS ...... 4

2.2 Simultaneous Localization and Mapping (SLAM) ...... 6

2.2.1 Gmapping ...... 8

2.3 Autonomous Navigation ...... 9

2.3.1 Move_base module ...... 10

2.3.2 Global planner ...... 10

2.3.3 Local planner ...... 10

2.3.4 Costmap ...... 14

2.3.5 Augmented Monte Carlo Localization (AMCL) ...... 15

2.4 Object Detection ...... 18

iv

2.4.1 Deep Learning using Neural Network (CNN) ...... 18

2.4.2 YOLO ...... 22

CHAPTER 3. METHODOLOGY ...... 25

3.1 Hardware Components...... 25

3.1.1 Controller ...... 25

3.1.2 Robotic moving base...... 25

3.1.3 3D-printed Dock ...... 26

3.1.4 Sensors ...... 26

3.1.5 Assembly...... 28

3.2 Software Implementation ...... 28

3.2.1 Preparation and Mapping ...... 28

3.2.2 Training ...... 29

3.2.3 Calibration for Camera and LiDAR ...... 31

3.2.4 Software Structure ...... 36

CHAPTER 4. EXPERIMENT ...... 39

4.1 Setup ...... 39

4.2 Result ...... 40

4.2.1 Choice of Local Planners ...... 40

4.2.2 Performances of Different Maximum Velocities ...... 40

4.2.3 General Performance ...... 41

v

4.2.4 Performances of Different Light Conditions ...... 43

4.2.5 Performances of Different Ground Conditions ...... 43

4.2.6 Performances with Confusions ...... 45

CHAPTER 5. DISCUSSION AND CONCLUSION ...... 46

REFERENCES ...... 48

APPENDIX A: INVOLVED LIBRARIES AND PACKAGES ...... 52

APPENDIX B: PROCEDURES OF DRIVING THE ROBOT AND THE LIDAR ...... 53

vi

LIST OF TABLES

Table 1: The Pseudocode for Basic MCL Algorithm [9] ...... 16

Table 2: The Pseudocode for AMCL Algorithm [9] ...... 17

Table 3: The Abbreviations in Figure 15. 16, and 17 ...... 24

Table 4: The Performance for Different Maximum Linear Velocities ...... 41

Table 5: The Performances for the Detection and Placing of Different Numbers of Toy

Cars ...... 42

Table 6: The Performance for Different Light Conditions ...... 43

Table 7: The Repeatability and The Performance for Different Light Conditions ...... 44

Table 8: The Result of Toy Car Detection with Confusions ...... 45

vii

LIST OF FIGURES

Figure 1: The Consists of a Typical ROS Package ...... 4

Figure 2: Communication Mechanism of ROS ...... 5

Figure 3: The Computation Graph of Gmapping ...... 8

Figure 4: The Architecture of Navigation Stack [44] ...... 9

Figure 5: Collision Check for each trajectory ...... 12

Figure 6: The Illustration of the 2nd phrase of the “Follow the Carrot” Algorithm ...... 13

Figure 7: The Inflation Process of Propagating Cost Values out from Occupied Cells [21]

...... 14

Figure 8: The Visualization of Navigation in RViz ...... 15

Figure 9: The Illustration of How MCL Algorithm Localizes ...... 16

Figure 10: The Frame Transformations for Fake_localization and AMCL ...... 18

Figure 11: Venn Diagram for Branches in [22] ...... 19

Figure 12: The Differences between the Branches in Artificial Intelligence [22] ...... 20

Figure 13: The Architecture of a Typical CNN [48] ...... 20

Figure 14: Ball chart reporting the Top-1 accuracy (using only the center crop) vs. computational complexity (floating point operations per second required for a single forward pass) [The radius indicates the model complexity] [45] ...... 21

Figure 15: The Architecture of YOLOv3 [46] ...... 22

Figure 16: The Architecture of YOLOv4 [46] ...... 23

Figure 17: The Architecture of YOLOv5s [46] ...... 23

Figure 18: The Photograph of the 3D Printed Dock ...... 26

Figure 19: The 3D Drawing for the Assembly Platform ...... 27

viii

Figure 20: The Photograph of the Assembly of All the Components...... 28

Figure 21: The Visualization of the SLAM Process in RViz ...... 29

Figure 22: The Interface of MakeSense.AI...... 30

Figure 23: The Result of Trained Custom Dataset ...... 31

Figure 24: The Illustration of the Relationship between the Camera, Image Frame ...... 32

Figure 25: The Illustration of the Related Frames [49] ...... 33

Figure 26: The Transformation and Projection between the Frames ...... 33

Figure 27: The Result of Finding the Patterns in Chessboard Using OpenCV Built-in

Function ...... 34

Figure 28:The Illustration of LiDAR-camera Calibration ...... 35

Figure 29: The Computational Graph of The System ...... 36

Figure 30: Process of Detecting and Placing Toy Cars ...... 37

Figure 31: The Illustration of Docking Control based on Close-up Camera ...... 38

Figure 32: The Photograph of Experiment Scenario ...... 39

Figure 33: The Photographs of Different Light Conditions ...... 43

Figure 34: The Photographs of Different Ground Conditions ...... 44

Figure 35: The Confusion Objects for the Experiment ...... 45

ix

CHAPTER 1: INTRODUCTION

1.1 Background

With the significant development of robotic technologies, are becoming ubiquitous. The applications of robots have been extended broadly from their traditional industry role to military, medical field, and even daily services [1].

Robots which are designed to support, accompany with and nurse humans with exceptional capabilities and robustness are commonly called [2]. Service robots can perform a wide range of tasks. Some of them may perform domestic tasks typically which are dirty and repetitive such as vacuuming, cleaning floors, mowing lawns, cleaning windows, for example, by iRobot and Winbot by Ecovacs.

Some, for example, Vector robot by Anki, are designed for entertainment or human accompanying. Other types include assistance for the elderly and handicap, personal transportation, etc. [3]

The rise of service robots forecasts a human-robot collaborative society [2].

Modern households are becoming more automated thereby delivering convenience and reducing time spent on repetitive house chores. Currently, the most common domestic robot is the vacuum robot, a disk-shaped robot equipped with cleaning technology and fabricated from microcontrollers, sensors, and drive [4]. It can navigate autonomously in indoor environments using patterned algorithms.

Toy cars are on most children’s favorite toy lists and many children may own a bunch of toy cars. Similar scenarios may appear to every family with children in which a child is diverted to something else, like sleep or eat, after playing his/her toy cars. It is common that these toy cars are left on the ground, which impose hidden danger. The

1 thesis is aimed to develop a ROS-based domestic robot detecting and placing toy cars by utilizing the ROS platform.

By using the iRobot Create 2 (a programmable version of commercial Roomba) as a mobile base, it is possible to supplement the object detection and navigation functions, enabling the robot to detect toy cars on the ground simultaneously and place them to the designated place while exploring the space.

1.2 Objectives

The objective of the thesis is to implement a simple robot system with ROS. In the thesis, all the software packages were integrated with a self-programmed package, including writing launch file and configuration file to execute and tune stable navigation, using PyTorch and YOLO to train data and fulfill real-time detection, and writing codes to integrate the navigation and object detection. The programming language for the self- programmed part is mainly python. The iRobot Create 2 was designed to patrol with designated coordinates to explore the experimental scenario, detect different numbers of toy cars on the ground and push the detected toy cars to the final goal. The average performance of the process for different numbers of toy cars was evaluated from several runs of experiments to check stability and analyze potential failures.

1.3 Thesis Organization

The thesis is divided into five chapters. After declaring the background and objective of the thesis in Chapter 1, the theories of related algorithms were introduced, including the structure of ROS, and the principle of utilized packages (gmapping, move_base, and YOLO) will be explained in Chapter 2. Chapter 3 will describe the methodology, including the mechanical design of accessories required to implement the

2 task, the software structure of the whole system, building map with SLAM stack, training data for object detection, and calibration for camera and LiDAR. Chapter 4 will experiment with the robot in different conditions, evaluate the performance and analyze the reason for the potential failures. In Chapter 5, some suggestions to improve the robot in the future based on the result of experiments operated in Chapter 4 will be discussed.

3

CHAPTER 2. THEORY

2.1 ROS

With the development and the increasing complication of the robotic field, modularity and reusability are highly demanded. ROS, an open-sourced and distributed middleware framework for developing applications of robots, was published [5]. ROS is officially operated on Linux operating systems. There are over 2,000 software packages, written and maintained by almost 600 people. Approximately 80 commercially available robots are supported [6]. The primary goal of ROS is to secure modularity and reusability in robotic research and development. Therefore, the ROS is organized as packages, where minor changes are required before application or integration.

Figure 1: The Consists of a Typical ROS Package

ROS filesystem is aimed to offer a centralized structure of building processes of projects and provide flexibility to distribute dependencies. Each package contains a ROS package manifest file (package.xml), a build file (CMakeLists.txt), programs, dependent libraries, and configuration files, etc. The structure of a typical ROS package is shown in

Figure 1. The shaded blocks are files and the unshaded are folders with corresponding 4 types of files specified in brackets. A build file contains a set of directives and instructions describing the project's executables and libraries. A ROS package manifest file declares information such as the package name, version, authors, maintainers, license, compilation flags, and dependencies on other packages [7].

Figure 2: Communication Mechanism of ROS

There are some terms in ROS – node, message, topic, service, and action. Nodes are basically executable files performing computation and communication. The information passed between nodes is called messages. A topic is used for continuously

5 publishing or subscribing messages unidirectionally. Service is another way to pass data between nodes. For the tasks occasionally executed within a bounded period, services allow nodes to request messages and await responses synchronously. Nodes can publish or subscribe messages through a topic or exchange a synchronous request-and-response message through service. Action is an advanced interface to implement goal-orientated, time-extended behaviors. While services are synchronous, actions are asynchronous. An action initializes a behavior with services for goal and result. It also provides a topic for feedback and allows the goal to be canceled [8]. The communication mechanism between nodes with a topic, a service, or an action at the computational graphic level is shown in

Figure 2. ROS also provides an effective visualization tool, RViz (the acronym for ROS visualization), which permits users to see robot models, sensor data, or other visualizable topics.

2.2 Simultaneous Localization and Mapping (SLAM)

The robot acquires a map of its environment while simultaneously localizing itself relative to this map is abbreviated as SLAM [9]. The automated mapping of unknown environments is one of the fundamental problems of autonomous mobile robots.

Constructing a precise map requires quality localization of the robot, and the robot’s localization, in turn, depends on the accuracy of the map. Many methods were developed to solve the state estimation problem. Among all the methods, Rao–Blackwellized particle filters (RBPF) were widely employed. It was first applied by Michael

Montemerlo in 2002, and the algorithm was called FastSLAM [10].

6

The posterior probability can be expressed as 푝(푥1:푡, 푚|푧1:푡, 푢1:푡) about the maps

푚 and the trajectories 푥1:푡 = 푥1, . . . 푥푡 consisting of the robot’s poses 푥푡 at each point of time 푡, given the sensor observation 푧1:푡 = 푧1, . . . 푧푡 and odometry measurement 푢1:푡 =

푢1, . . . 푢푡. The essential idea of the RBPF is to divide the posterior into two parts as Eqn.

(2.1) according to chain rule and removing irrelevant conditions. 푝(푥1:푡|푧1:푡, 푢1:푡) represents the trajectory estimation of the robot and 푝(푚|푥1:푡, 푧1:푡) represents the map estimation given trajectory and sensor observation. It allows us to estimate the robot’s trajectory, and then derive the map from it.

푝(푥1:푡, 푚|푧1:푡, 푢1:푡) = 푝(푥1:푡|푧1:푡, 푢1:푡)푝(푚|푥1:푡, 푢1:푡−1, 푧1:푡)

= 푝(푥1:푡|푧1:푡, 푢1:푡)푝(푚|푥1:푡, 푧1:푡) (2.1)

A map with known trajectories and sensor observations (푚|푥1:푡, 푧1:푡) can be computed analytically [11]. The trajectory estimation 푝(푥1:푡|푧1:푡, 푢1:푡) can be factored

[12] as Eqn. (2.2), which allows us to predict the current poses 푥푡 incrementally, from the last poses 푥푡−1 and the odometry measurement 푢푡 [13].

푝(푥1:푡|푧1:푡, 푢1:푡) = 푝(푥1:푡|푧푡, 푢1:푡, 푧1:푡−1)

= 휂푝(푧푡|푥1:푡, 푢1:푡, 푧1:푡−1) ∙ 푝(푥1:푡|푢1:푡, 푧1:푡−1)

= 휂푝(푧푡|푥푡) ∙ 푝(푥푡|푥1:푡−1, 푢1:푡, 푧1:푡−1)푝(푥1:푡−1|푢1:푡, 푧1:푡−1)

= 휂푝(푧푡|푥푡) ∙ 푝(푥푡|푥푡−1, 푢푡)푝(푥1:푡−1|푢1:푡−1, 푧1:푡−1) (2.2)

where 1/휂 = 푝(푧푡|푢1:푡, 푧1:푡−1) = 푐표푛푠푡

푝(푥1:푡−1|푢1:푡−1, 푧1:푡−1) represents the robot’s trajectory at the last moment. Each particle’s current pose is predicted according to 푝(푥푡|푥푡−1, 푢푡) to obtain its trajectory and is weighted with 푝(푧푡|푥푡). The map can be built according to the trajectory and sensor observation. Each particle obtains the robot’s pose of the last moment and its

7 corresponding weights and map. Therefore, the main problem of the FastSLAM is that it requires a huge number of particles to build an accurate map and makes it unachievable in the real world.

2.2.1 Gmapping

In 2007, another SLAM algorithm called gmapping was published by Giorgio

(푖) Grisetti [14]. Instead of the distribution 푝(푥푡|푥푡−1 , 푢푡), the gmapping algorithm

(푖) predicts each particle’s current pose according to 푎푟푔푚푎푥{푝(푧푡|푥푡, 푚)푝(푥푡|푥푡−1 , 푢푡)} and adaptively resamples when the change of weights overflows a designated threshold.

The substitute of distribution can be derived as Eqn. (2.3). Since sensor observation is highly more accurate than odometry measurement, compared with

푝(푧푡|푥푡, 푚), 푝(푥푡|푥푡−1, 푢푡) can be considered as a constant [13].

푝(푥푡|푥푡−1, 푢푡) = 푝(푥푡|푧푡, 푢푡, 푥푡−1, 푚)

= 휁푝(푧푡|푥푡, 푢푡, 푥푡−1, 푚)푝(푥푡|푥푡−1, 푢푡, 푚)

= 휁푝(푧푡|푥푡, 푚)푝(푥푡|푥푡−1, 푢푡) = 휇푝(푧푡|푥푡, 푚) (2.3)

where 1/휁 = 푝(푧푡|푢푡, 푥푡−1, 푚) = 푐표푛푠푡

Hitherto, gmapping is still the most popular SLAM algorithm to build maps of small space [15].

Figure 3: The Computation Graph of Gmapping

8

Gmapping has a ROS-wrapped package, which can be installed and applied directly after setting proper configuration parameters.

After a map is constructed by gmapping, map_server needs to be executed to save the map for future usage. The map will be saved as an image of the grid-occupancy map and a configuration .yaml file describing the parameters of the map.

The gmapping package requires the frame transformation from the LIDAR to the robot (published by robot_state_publisher or static_transform_publisher), and from the robot base to the robot odometry (usually published by odometry node) and then publishes that from the map to the odometry. The ROS computation graph of gmapping nodes is shown in Figure 3.

2.3 Autonomous Navigation

Figure 4: The Architecture of Navigation Stack [44]

Like SLAM mentioned above, autonomous navigation is achieved through a ROS package called “navigation stack”. The ROS navigation stack contains sub-packages for various modules in navigation. The architecture of the stack is shown in modules in

Figure 4. Section 2.3.1 – 2.3.5 will introduce several important modules.

9

2.3.1 Move_base module

The move_base module is a ROS node that houses the central component of the navigation stack. It provides an action server to respond to user’s requests, taking in goals, and integrates all the nodes in the navigation stack to implement the path planning and control.

2.3.2 Global planner

A global planner plans trajectories in the whole map. The ROS navigation stack includes an obstacle-avoidable package “_navfn_” using Dijkstra’s algorithm to plan the global path for robots.

Dijkstra's algorithm is a graph-based motion planning and was published by E. W.

Dijkstra in 1959 [16]. It works by visiting nodes layer by layer. Beginning from the start. it then repeatedly explores the unvisited adjacent nodes of the node visited, expanding outwards from the start to the goal. Dijkstra's algorithm guarantees to find the shortest path.

2.3.3 Local planner

Unlike global planners, a local planner in autonomous navigation works with the dynamic information currently obtained from the sensors and only plans the trajectories within the vicinity of the robot. Local planners also publish velocity commands frequently to drive robots based on local plans. The ROS navigation stack provides a local planner using the trajectory rollout algorithm (base_local_planner), as well as a local planner using the dynamic window approach (dwa_local_planner). ROS also allows users to apply other customized local planners, such as the one using the follow the carrot algorithm (asr_ftc_local_planner).

10

i. dwa_local_planner package

The dwa_local_planner package implements local navigation of a mobile robot on a plane with the dynamic window approach. The planner requires current speed information, which can be acquired by a wheel odometer or other sensors, and publishes the planned path according to trajectory within the simulation period.

Under the assumption of piecewise constant acceleration, 푣(푡푖) and 휔(푡푖) represent the linear and angular velocity of the robot, and the corresponding acceleration

푣̇푖 and 휔̇ 푖 keep constant in a time interval [푡푖, 푡푖+1] for 𝑖 = 1. . . 푛. 푥(푡푖) and 푦(푡푖) denote the robot’s coordinate, and 휃(푡푖) describes the robot’s orientation in the moment 푡푖. The trajectory propagation shown in Eqn. (2.4) and Eqns.(2.5) can be derived from the equations of motion [17].

푛−1 푡푖+1 푥(푡푛) = 푥(푡0) + ∑ ∫ (푣(푡푖) + 푣̇푖 ∙ (푡 − 푡푖)) ∙ 푐표푠(휃(푡푖) + 휔(푡푖)(푡 − 푡푖) 푖=0 푡푖 1 + 휔̇ (푡 )(푡 − 푡 )2)푑푡 2 0 푖 (2.4) 푛−1 푡푖+1 푦(푡푛) = 푥(푡0) + ∑ ∫ (푣(푡푖) + 푣̇푖 ∙ (푡 − 푡푖)) ∙ 푠𝑖푛(휃(푡푖) + 휔(푡푖)(푡 − 푡푖) 푖=0 푡푖 1 + 휔̇ (푡 )(푡 − 푡 )2)푑푡 2 0 푖 (2.5) The idea of the dynamic window approach is to discretely sample velocity and perform forward simulation from the robot’s current pose and predict the trajectories if sampled velocities were applied for the simulation period. A velocity is considered admissible if each point in the generated trajectory is collision-free (illustrated as Figure

5). Evaluate each trajectory with Eqn. (2.6) and operate the sample with the lowest cost.

푐표푠푡 = 푠1 ∙ 푑푝 + 푠2 ∙ 푑푔 + 푠3 ∙ 푐푚푎푥 (2.6)

11

where s1, s2, and s3 are customized scaling factors, dp and dg are the distances from the endpoint of the trajectory to the path and the goal, respectively, and cmax is the maximum obstacle cost along the trajectory [18].

Figure 5: Collision Check for each trajectory ii. asr_ftc_local_planner package

The asr_ftc_local_planner package is a local planner using the follow the carrot algorithm. Though this local planner has limitations in that it can only be applied to differential-driven robots and can only drive robots forward, it has several benefits. The algorithm is suitable for indoor autonomous navigation considering simple computation, few parameters, and keeping robots close to the global plan.

The whole process is divided into three phases after setting a new goal:

1. rotating in-place to make the robot face the global plan’s orientation at the start point;

12

2. moving forwards along with the global path, the local planner publishes velocity command frequently to make the robot track the global plan;

3. rotating in-place again to make the robot face the global plan’s orientation at goal [19].

Figure 6: The Illustration of the 2nd phrase of the “Follow the Carrot” Algorithm

For the second phase, the planner always tries to reach the farthest point on the global path. As shown in Figure 6, the radius of the grey dashed circle equals the product of maximum linear velocity and simulation time. The intersection between the circle and the global path indicates the farthest reachable point on the global path based on maximum linear velocity. The grey arrowed line represents the robot’s pose after the first phase. The grey shaded circle sector symmetric to the robot’s pose, whose angle equals the product of maximum angular velocity and double simulation time, denotes the reachable area according to the maximum angular velocity. After computing the farthest reachable point based on linear velocity, the planner will judge whether the maximum angular velocity is enough to reach the same point. If it is the condition as the red global

13 path, the planner will compute the necessary angular velocity to get the target of the simulation time (the red point). If it is the condition as the azure global path, the planner will compute the necessary linear velocity to get the target of the simulation time (the azure point).

2.3.4 Costmap

Figure 7: The Inflation Process of Propagating Cost Values out from Occupied Cells [21]

A costmap is a grip map where each cell is assigned a specific cost value [20].

The costmap_2d package is a plugin in the ROS navigation stack to form a costmap for path planning, which is generally the superposition of the costs in the static map layer, the obstacles layer, and the inflation layer.

Each grid from each layer possesses one byte to store its cell cost ranging from 0 to 255, by which can define the state of the grid and classify them into occupied and free.

Each layer updates its state individually. Take a pentagonal robot as an example as shown

14 in Figure 7, the inscribed grids are in collisions, the grids from the inscribed circle to the circumscribed circle are possibly in collisions and the grids outside the circumscribed circle are free space. Such a criterion is employed to obstacle inflation.

The static map layer contains a static costmap for the map generated by SLAM.

The obstacle layer is generated from sensor data. The inflation layer implements the obstacle inflation part. The cell costs are computed based on the occupancy grid map and specified inflation radius. The costmap can be visualized as shown in Figure 8, where the blue area is inflated obstacles [21].

Figure 8: The Visualization of Navigation in RViz

2.3.5 Augmented Monte Carlo Localization (AMCL)

The AMCL package implements the localization function based on probability. It is an improvement of basic Monte Carlo Localization (MCL).

The MCL is aimed to match the LiDAR data and the grid occupancy map and to predict the localization in the grid occupancy map with highest probability based on the

LiDAR data (shown in Figure 9).

15

Figure 9: The Illustration of How MCL Algorithm Localizes

The pseudocode of basic MCL is shown in Table 1. Basic MCL will first

[1] [2] [푀] randomly generate a bunch of particles 푋푡 = {푥푡 , 푥푡 , . . . , 푥푡 } and predict the movement of the particles based on the odometry motion model. Then, update the weights of each particle based on the beam measurement model and replace the low- weighted particles with the copies of high-weighted particles. Then compute the weighted mean and covariance of the resampled particle set to obtain the estimation of the robot’s state.

Table 1: The Pseudocode for Basic MCL Algorithm [9]

Algorithm MCL(푋푡−1, 푢푡, 푧푡, 푚): 푋̅̅̅푡 = 푋푡 = 훷 ; for 푚 = 1 to M do [푚] [푚] 푥푡 = 푠푎푚푝푙푒_푚표푡𝑖표푛_푚표푑푒푙(푢푡, 푥푡−1 ) [푚] [푚] 푤푡 = 푚푒푎푠푢푟푒푚푒푛푡_푚표푑푒푙(푢푡, 푥푡 , 푚) [푚] [푚] 푋̅̅̅푡 = 푋̅̅̅푡 + (푥푡 , 푤푡 ) endfor for 푚 = 1 to M do [푖] draw 𝑖 ∈ {1, . . . , 푁} with probability ∝ 푤푡 [푖] add 푥푡 to 푋푡 endfor return 푋푡

16

However, important particles may be lost after frequently copying high-weighted particles. To solve the problem, random particles are adaptively added under certain conditions. The pseudocode of AMCL is shown in Table 2. Parameters 훼푠푙표푤 and 훼푓푎푠푡 are the decay rates for the exponential filters that estimate the long-term averages and the short-term averages, which obey 0 ≤ 훼푠푙표푤 ≪ 훼푓푎푠푡. Random samples will only be added when the short-term likelihood is worse than the long-term. In a word, a sudden measurement likelihood decay will induce the increase of random sample numbers.

Table 2: The Pseudocode for AMCL Algorithm [9]

Algorithm AMCL(푋푡−1, 푢푡, 푧푡, 푚): Static 푤푠푙표푤, 푤푓푎푠푡 푋̅̅̅푡 = 푋푡 = 훷 ; for 푚 = 1 to M do [푚] [푚] 푥푡 = 푠푎푚푝푙푒_푚표푡𝑖표푛_푚표푑푒푙(푢푡, 푥푡−1 ) [푚] [푚] 푤푡 = 푚푒푎푠푢푟푒푚푒푛푡_푚표푑푒푙(푢푡, 푥푡 , 푚) [푚] [푚] 푋̅̅̅푡 = 푋̅̅̅푡 + (푥푡 , 푤푡 ) 1 푤 = 푤 [푚] 푎푣푔 푀 푡 endfor 푤푠푙표푤 = 푤푠푙표푤 + 훼푠푙표푤(푤푎푣푔 − 푤푠푙표푤) 푤푓푎푠푡 = 푤푓푎푠푡 + 훼푓푎푠푡(푤푎푣푔 − 푤푓푎푠푡) for 푚 = 1 to M do with probability 푚푎푥(0.0, 1.0 − 푤푓푎푠푡/푤푠푙표푤) do add random pose to 푋푡 else [푖] draw 𝑖 ∈ {1, . . . , 푁} with probability ∝ 푤푡 [푖] add 푥푡 to 푋푡 endwith endfor return 푋푡

The AMCL package in the navigation stack implements the AMCL approach. An initial pose is required by the AMCL node to give an initial Gaussian distribution. The orange pose arrays in Figure 9 indicate the probability of the robot’s pose. After given the

17 initial pose, scattered pose arrays will appear around the robot. They will cluster gradually during movement as shown in the small box of Figure 9.

The comparison of the frame transforms for fake_localization, which assumes no odometry drift so that the map and odometry frame are coincided always, and for AMCL is shown in Figure 10. The AMCL package will publish the frame transforms from the odometry to the map to correct the actual position.

Figure 10: The Frame Transformations for Fake_localization and AMCL

2.4 Object Detection

2.4.1 Deep Learning using Neural Network (CNN)

Machine learning is a branch of artificial intelligence, which builds models based on labeled samples to predict the labels of unknown samples. Other than the proper application of algorithms, the performance of machine learning also highly depends on the quality of the collected data. In machine learning, qualified features are manually extracted. However, it is not applicable to manually extract useful features from complicated data such as images, videos, or audio. Therefore, representation learning (or feature learning) was developed, which can discover features automatically from raw

18 data. Representation learning with multiple layers to abstract features is called deep learning.

Figure 11: Venn Diagram for Branches in Artificial Intelligence [22]

The relationship between domains in artificial intelligence is illustrated in Figure

11. Figure 12 shows the detailed differences in how different learning methods work. The shaded boxes in the figure indicate components learning from data [22]. Most deep learning models are based on artificial neural networks such as Convolutional Neural

Network (CNN). CNN is designed to process data in the form of multiple arrays such as

RGB images, which are composed of three color-layers of 2D pixel arrays [23].

The architecture of a typical CNN is composed of several convolution layers, pooling layers, and fully connected (FC) layers (Figure 13). Models are trained with loss functions through forward propagation on a training dataset. The parameters of each layer are updated using backpropagation. Various activation functions, e.g., the Rectified

Linear Unit (ReLU) is a type of activation function, are used to introduce non-linearity into the output of a neuron.

19

Figure 12: The Differences between the Branches in Artificial Intelligence [22]

Figure 13: The Architecture of a Typical CNN [48]

20

With the increment of computing power, deeper structures are developed to achieve higher ability representation. The computational complexities of different deep neural networks are shown in Figure 14.

Figure 14: Ball chart reporting the Top-1 accuracy (using only the center crop) vs.

computational complexity (floating point operations per second required for a single forward

pass) [The radius indicates the model complexity] [45]

21

There are several frameworks for deep learning. They are open-sourced libraries where deep neural networks can be structured through built-in modules. The most popular two are TensorFlow (built by Google Brain Team in 2015) [24] and

PyTorch (released by Adam Paszke etc. in 2016 and now operated by Facebook) [25].

2.4.2 YOLO

The “You Only Look Once” (YOLO) model is a real-time object recognition model based on darknet, a simple deep learning framework. The first version was published by Joseph Redmon in 2016 [26]. In the next two years, the author released two improved versions successively [27] [28]. The architecture of YOLOv3 is shown in

Figure 15. In 2020, another author named Alexey Bochkovsky released the latest authorized version, YOLOv4, which improves the accuracy by a large margin compared with the prior version and outperforms other detection methods [29]. The architecture of

YOLOv4 is shown in Figure 16.

Figure 15: The Architecture of YOLOv3 [46]

22

Figure 16: The Architecture of YOLOv4 [46]

Two months after the release of YOLOv4, another author, Glenn Jocher published a YOLO version called YOLOv5 which is based on PyTorch instead of the original darknet, which makes it easier to apply [30]. The name YOLOv5 is controversial since it has a similar architecture of backbone and neck as YOLOv4 with minor improvements

[31].

Figure 17: The Architecture of YOLOv5s [46]

23

YOLOv5 contains four types of architectures which are named with suffix s for small, m for medium, l for large, and x for extra-large, according to the number of residual units in CSP1_X, CBL in CSP2_X, and convolutional kernel number. The architecture of YOLOv5s is shown in Figure 17.

The abbreviations in the figures are listed in Table 3.

Table 3: The Abbreviations in Figure 15. 16, and 17

Conv: convolution layer CBL: Conv+BN BN: batch normalization Leaky_relu: Leaky rectified linear unit activation function 푥 𝑖푓 푥 > 0 푓(푥) = { 0.01푥 표푡ℎ푒푟푤𝑖푠푒 CBL: Conv+BN+Leaky_relu Res unit: residual unit which deepens the structure Concat: tensor concatenation Add: tensor addition US: upsampling Mish: Mish activation function 푓(푥) = 푥 ∙ 푡푎푛ℎ(푠표푓푡푝푙푢푠(푥)) where 푠표푓푡푝푙푢푠(푥) = 푙푛(1 + 푒푥) CBM: Conv+BN+Mish SPP: spatial pyramid pooling Slice: data slice

24

CHAPTER 3. METHODOLOGY

3.1 Hardware Components

The hardware components equipped to implement the function of SLAM, navigation, and object detection are listed in Section 3.1.1-3.1.4.

3.1.1 Controller

To implement the computation and control of SLAM, navigation, and object detection, a controller installed with the Linux operating system is essential. Additionally, the equipped with a mid-range graphic processing unit enables real-time object detection at high speed. The specifications of the chosen laptop are list below:

Device: Dell G3 3500

Memory: 7.6GB

Processor: Intel® Core™ i5-10300H CPU

Graphics: Nvidia GeForce GTX 1650Ti

Operating system: Ubuntu 18.04.5 LTS

3.1.2 Robotic moving base

iRobot Create 2 is an economical and programmable differential drive robot.

Differential wheeled mobile robots can be separately driven based on linear and angular velocities. By employing a differential drive robot, the geometric constraints can be disregarded while controlling the robot to move along a planned path. Considering the maneuverability and economic efficiency, it is chosen as the moving base for the thesis.

25

3.1.3 3D-printed Dock

Due to the limitation of time and hardware, it is not applicable to grip the detected toy car with the gripper on the . A solution to place the toy car in a designated position is to dock the detected toy car and keep it ahead of the robot during moving and rotating until it reaches the goal. The 3D printed dock is shown in Figure 18. The length and width are tested to guarantee its ability to keep the toy car inside while moving forward and rotating and to let the toy car escape from it while moving backward at the goal position. The robot can be controlled after checking the relative position between the dock and the detected toy car.

Figure 18: The Photograph of the 3D Printed Dock

3.1.4 Sensors

i. LiDAR

A LiDAR sensor is employed during building maps with SLAM and obstacle detection during navigation. Considering the application environment and economic efficiency, SLAMTEC RPLidar A1M8, whose specifications are list below [32], is applied in this project:

26

Measurement range: 0.15-12 m

Angular range: 0-360 °

Measurement resolution: < 1% × actual distance

Angular resolution: ≤ 1 °

Measurement frequency: 8000 Hz (at Boost scan mode)

Scan frequency: 5.5 Hz

ii. Web cameras

Cameras are used for object detection and estimating the object position. Two cameras are applied: a close-up camera and a full-shot camera. A Logitech® C270 webcam, the full-shot camera, is applied to object detection during the robot exploring the room. A Logitech® C930e webcam, the close-up camera, is for checking the relative position between the toy car and the dock and controlling the robot accurately to place the toy car inside the dock.

Figure 19: The 3D Drawing for the Assembly Platform

27

3.1.5 Assembly

To integrate all the equipment and sensors mentioned above and keep them from affecting each other, the assembly frame shown in Figure 19 was manufactured. The assembly of all the components mentioned above is shown in Figure 20.

Figure 20: The Photograph of the Assembly of All the Components

3.2 Software Implementation

3.2.1 Preparation and Mapping

The ROS-wrapped driver for iRobot Create 2 and RPLidar can be found online

[33] [34]. After all the packages were successfully connected, a program to publish velocity commands to the robot with a keyboard was written for a remote-control purpose during SLAM. The gmapping node was properly configured and launched to build the map. By using RViz, the process and the quality of mapping can be visualized (Figure

21). 28

Figure 21: The Visualization of the SLAM Process in RViz

After the map was built, the configuration parameters for the move_base node were tuned to keep a relatively stable navigation performance. The value of all the tuned parameters in the thesis can be checked in the source codes.

3.2.2 Training

The image can be labeled by MakeSense.AI (Figure 22). Labels are exported as text files with the same names as the image files in YOLO format, which contains widths, heights, and centers’ coordinates of bounding boxes and class names. The dataset needs to be separated into the training set and the validation set. The network architecture was mentioned at the end of Section 2.4.2. After modifying file paths, numbers, and names of classes in data configuration in the YOLOv5 package and organizing all the necessary files, the dataset was trained for several hours.

29

Figure 22: The Interface of MakeSense.AI

The result of trained data is shown in Figure 23. The precision (Eqn.3.1) indicates the degree of false-positive prediction, while the recall (Eqn.3.2) indicates the degree of false-negative prediction [35]. The Generalized Intersection over Union (GIoU) (Eqn.3.3) describes the quality of a predicted bounding box, which is calculated from the coincidence degree between the predicted bounding box A and the labeled bounding box

B. C is the smallest box that encloses both A and B [36].

푇푟푢푒 푃표푠𝑖푡𝑖푣푒 푃푟푒푐𝑖푠𝑖표푛 = 푇푟푢푒 푃표푠𝑖푡𝑖푣푒 + 퐹푎푙푠푒 푃표푠𝑖푡𝑖푣푒 (3.1)

푇푟푢푒 푃표푠𝑖푡𝑖푣푒 푅푒푐푎푙푙 = 푇푟푢푒 푃표푠𝑖푡𝑖푣푒 + 퐹푎푙푠푒 푁푒푔푎푡𝑖푣푒 (3.2)

|퐴 ∩ 퐵| |퐶\(퐴 ∪ 퐵)| 퐺퐼표푈 = − |퐴 ∪ 퐶| |퐶| (3.3)

퐺퐼표푈 퐿표푠푠 = 1 − 퐺퐼표푈 (3.4)

30

Figure 23 shows the GIoU loss (Eqn. (3.4)) converges for both the training and the validation set over times of the entire dataset passing forward and backward through the neural network [37]. The precision, the recall, and the mean average precision for bounding boxes whose GIoU is larger than 0.5 ([email protected]) stand at 1.0. The mean

Average Precision for bounding boxes whose GIoU ranges from 0.5 to 0.95

([email protected]:0.95) stands at 0.57.

Figure 23: The Result of Trained Custom Dataset

3.2.3 Calibration for Camera and LiDAR

Camera calibration is the process to estimate the parameters of the lens and the image sensors, including internal, external, and distortion coefficients, which can be used for applications such as measuring the geometric properties of the object and estimating the location of the cameras in the world frame [38]. As shown in Figure 24, the

31 coordinate of a point in the camera frame can be transformed into the (virtual) image plane through central projection [39].

Figure 24: The Illustration of the Relationship between the Camera, Image Frame

There are four coordinate frames involved in the transformation from world to camera. The world frame is arbitrary. The camera frame’s origin is the pinhole of the camera and its z-axis coincides with the optical axis (the Fc in the bottom right of Figure

25). The xy-plane of the image frame is perpendicular to the z-axis of the camera frame and its origin coincides at the focal point (the azure xy-coordinate in Figure 25). The pixel frame is simply 2D rotation and translation based on the image frame (the tan uv- coordinate in Figure 25).

Internal parameters K of a camera describe the projection from camera frame to pixel frame. The relationship regardless of distortion is shown in Eqn. (3.5).

푢 1 푓푥 0 푐푥 푋푐 1 푋푐 [푣] = ∙ [ 0 푓푦 푐푦] [푌푐 ] = ∙ 퐾 [푌푐 ] (3.5) 푍푐 푍푐 1 0 0 1 푍푐 푍푐

32

Figure 25: The Illustration of the Related Frames [49]

However, cameras usually have distortion. Therefore, the corrected coordinates of radial and tangential distortion in the image frame are expressed in Eqn. (3.6) and Eqn.

(3.7), respectively, where 푟2 = 푥2 + 푦2 [40].

2 4 6 푥푐표푟푟푒푐푡푒푑 = 푥(1 + 푘1푟 + 푘2푟 + 푘3푟 ) { 2 4 6 (3.6) 푦푐표푟푟푒푐푡푒푑 = 푦(1 + 푘1푟 + 푘2푟 + 푘3푟 )

2 2 푥푐표푟푟푒푐푡푒푑 = 푥 + [2푝1푥푦 + 푝2(푟 + 2푥 )] { 2 2 (3.7) 푦푐표푟푟푒푐푡푒푑 = 푦 + [2푝2푥푦 + 푝1(푟 + 2푦 )]

Figure 26: The Transformation and Projection between the Frames

33

The transformations and projections from pixel frame to world frame are shown in Figure 26. External parameters of a camera describe the transformation from world frame to camera. R and T are the rotation matrix and translation vector, respectively.

Figure 27: The Result of Finding the Patterns in Chessboard Using OpenCV Built-in Function

OpenCV has a built-in function (cv2.findChessboardCorners) to detect patterns in a chessboard (Figure 27), which can be applied to camera calibration. Intrinsic matrix K and distortion parameters [k1, k2, p1, p2, k3] of a camera can be computed with OpenCV built-in function cv2.CalibrateCamera. Extrinsic parameters of a camera R and T can be computed with OpenCV built-in function cv2.solvePnP [41]. The built-in functions in

OpenCV consider the left-bottom corner of a chessboard as the origin of the world frame and set the z-axis perpendicular to the chessboard. By calibrating with a chessboard on the ground, the transformation from the camera to the world frame, whose xy-plane coincides with the ground, is obtained.

The coordinates of the bounding box of detected toy cars in the pixel frame (u, v,

1) are known. The internal parameter K and external parameters R,T of the camera were calibrated. Assuming that toy cars are always on the ground (푍푤 = 0 in Eqn. (3.8)), the

34 scaling parameter 푍푐 can be computed. Then the coordinates of detected toy cars in the camera frame can be obtained according to Eqn. (3.5).

푋푤 푋푐 푢 −1 −1 −1 −1 [푌푤 ] = 푅 ([푌푐 ] − 푇) = 푍푐 ∙ 푅 퐾 [푣] − 푅 푇 (3.8) 푍푤 푍푐 1

Imagining that a 2D plane is formed by the LiDAR beam (as the red area shown in Figure 28), the LiDAR beam plane and the chessboard plane form a line, and the point on the line in the LiDAR frame can be represented as (Xl, Yl, 0). The constraints of a point PC on the chessboard plane in the camera frame can be described with the normal vector of the LiDAR-formed plane 풏푐 and its distance to the camera frame’s origin 푑

(Eqn. (3.8)). The rotation matrix and translation vector from camera frame to LiDAR

푙 푙 frame is notated as 푅푐 and 푇푐 . Then Eqn. (3.9) can be expressed as Eqn. (3.10) [42].

푇 풏푐 푃푐 + 푑 = 0 (3.9)

푋푙 1 0 푋푙 푇 푙 푇 푙 푇 푙 풏푐 [([푌푙 ] − 푻푐)] = 풏푐 (푅푐 [0 1 −푻푐]) [푌푙 ] = −푑 (3.10) 0 0 0 0

Figure 28:The Illustration of LiDAR-camera Calibration

35

Since 풏푐 and 푑 can be computed from the camera’s external parameters and the

푙 푙 coordinates in the LiDAR frame can be obtained from LiDAR input, the 푅푐 and 푻푐 can be solved by singular value decomposition [43].

By transforming the frames from the pixel to the camera, and then the camera to the LiDAR, the relative positions of detected toy cars in the LiDAR frame can be obtained.

3.2.4 Software Structure

The drivers of iRobot Create 2 and RPlidar A1 already existed. A map was constructed with gmapping SLAM. YOLOv5 was applied to train the custom dataset of toy cars and implement real-time detection.

Figure 29: The Computational Graph of The System

The ROS computational graph of the designed system is shown in Figure 29.

Each ellipse represents a ROS node. Messages from LiDAR were passed to move_base and AMCL nodes for obstacle detection and localization purposes. Messages of frame transformations amongst the LiDAR, the robot base, the odometry, and the map were passed amongst all the nodes. By using packaged navigation stack, the robot was able to receive velocity commands from local planners and autonomously move among

36 designated goals. The major work of the system was creating a package to transfer frames from pixel to LiDAR and integrate all the nodes.

The flowchart of the detection and placing process is shown in Figure 30. The

URL of all the packages and source code of the thesis are listed in APPENDIX A. The robot will rotate at the start point to check whether there are toy cars around, then it is set to explore the room through specific coordinates in the world frame sequentially. If any a toy car is found, the robot will push it to the goal and then start another exploration. If no toy car is found during exploration, the process will end when it reaches the final point of the routine path.

Figure 30: Process of Detecting and Placing Toy Cars

37

When the detecting-and-placing process starts, the long shot camera and the

LiDAR begins to search and localize toy cars. When a toy car is found, the robot autonomously navigates to the vicinity of the toy car. The field-of-view of the close-up camera is separated into six areas as shown in Figure 31. Based on the toy car’s position, the robot will move to place the toy car into the dock. The process ends when one of the detected toy cars is located in the “finish” area.

Figure 31: The Illustration of Docking Control based on Close-up Camera

38

CHAPTER 4. EXPERIMENT 4.1 Setup

The camera’s detection frequency is about 100 frame-per-second. Therefore, the max velocities of the robot were set to a low value (linear: 0.18m/s, angular: 0.4 rad/s).

The robot was set to perform detection-and-placing in the scenario shown in Figure 32.

The yellow dashed line is the expected patrol route, and the green arrowed line indicates the designated poses. When a toy is detected, it will be placed at the red arrowed line in the bottom right. (This position is referred as the final goal position in the following texts).

Figure 32: The Photograph of Experiment Scenario

Different numbers of toy cars, ranging from only one to six, were placed on the ground. Each run of the experiment was evaluated by total time, the average time to place one toy car, accuracy (the ratio of the number of the detected toy cars to the total number

39 of toy cars on the ground), and success rate (the ratio of the number of the successfully placed toy cars to the total number of toy cars on the ground).

4.2 Result

4.2.1 Choice of Local Planners

The local planners using the dynamic windows approach and the follow-the-carrot algorithm were both tested as the local planner for navigation. However, when executed on an actual robot, the follow-the-carrot algorithm takes 104 seconds for a patrol circle on average, whereas the dynamic windows approach failed to implement a patrol circle since it is not suitable for low-velocity navigation inside a small room. When using dynamic windows approach, the robot was trapped in a deadlock of accelerating when near a toy, missing the goal, and navigating again. It may be caused by the latency in transmitting location information. Also, the dynamic windows approach allows the robot to move backward, which causes failures in pushing a detected toy car to the final goal.

4.2.2 Performances of Different Maximum Velocities

Each run the robot started from the final goal. A toy car was located at the same point (the first point of the patrol routine). The maximum angular velocity was kept at 0.4 rad/s and the maximum linear velocity doubled that of the previous run. Though the robot is still able to detect under a linear velocity up to 0.72m/s, such a speed is too fast for a small space and may cause a failure of navigation. Since under seldom condition toy cars were detected during rotation, influences on the maximum angular velocities can be neglected.

40

0.18m/s and 0.36m/s shared the same performance about accuracy. A higher maximum linear velocity reduced the total time with one minute of one run. However, a higher velocity may separate a toy car and the dock and make the assembly unstable.

Therefore, 0.18m/s is suitable for the system.

Table 4: The Performance for Different Maximum Linear Velocities

Maximum Linear Total Run Toy car Detected Accuracy Velocity Time # # # (%) (m/s) (s) 1 0.18 181 1 1 100.0% 2 0.36 139 1 1 100.0% 3 0.72 - 1 1 100.0%

4.2.3 General Performance

To evaluate the general performance of the system, several experiments were operated under the same light and ground conditions. Since the system requires the final check during one patrol circle, the difference between total time and the time for one patrol circle is divided by the number of placed toy cars to compute the average time for placing one toy car. The test results are shown in Table 5, the average time to place one toy car is about one minute. A run without total time indicates this run did not complete the process. The average accuracy of real-time detection is 83.1% and the success rate of placing the toy car at the designated final goal is 78.9%.

Run 6 failed to place the last two detected toy cars. These cars were placed in a wrong goal caused by localization failures, which erroneously flipped the map.

41

Table 5: The Performances for the Detection and Placing of Different Numbers of Toy Cars

Total Success Avg. time Toy car Accuracy Run # Time Detected # Placed # Rate per toy car # (%) (s) (%) (s) 1 183 1 1 1 100.0% 100.0% 79 2 190 2 1 2 50.0% 100.0% 43 3 212 2 2 2 100.0% 100.0% 54 4 252 3 2 2 66.7% 66.7% 74 5 303 3 3 3 100.0% 100.0% 66 6 - 3 3 1 100.0% 33.3% - 7 350 5 4 4 80.0% 80.0% 62 8 320 5 5 5 100.0% 100.0% 43 9 394 6 5 5 83.3% 83.3% 58 10 232 6 3 3 50.0% 50.0% 43 11 - 6 4 2 66.7% 33.3% - 12 475 6 6 6 100.0% 100.0% 62

Average 83.1% 78.9% 60

In run 11, the close-up camera failed to detect the target toy car. Due to insufficient sample diversity for training, toy cars close to the bottom of the pixel are hard to be detected. When a toy car in the “finish” area was not detected and one in the “move backward” area was detected, the robot will move backward, which may keep the toy car used to be in the “finish” area away from this area and be detected. However, when the robot moves forward, it will fail to detect the toy car again and keeps moving forward and backward infinitely.

In some runs, the average time for placing one toy car is much shorter than others due to the robot occasionally pushing another toy car on its way to the final goal.

42

4.2.4 Performances of Different Light Conditions

The experiments were operated to evaluate the system performance under different light conditions (Figure 33). Each run of experiments was operated under the same circumstance (numbers and positions of toy cars, start point of the robot). As shown in Table 6, except for a failure caused by detection failure of the close-up camera in a dim environment, the system implemented the designed purpose most time.

Table 6: The Performance for Different Light Conditions

Success Run Toy car Detected Placed Accuracy Light Condition Rate # # # # (%) (%) 1 Sunlight 3 3 3 100.0% 100.0% 2 Lamplight 3 3 3 100.0% 100.0% 3 Dim 3 3 2 100.0% 66.7%

Figure 33: The Photographs of Different Light Conditions

4.2.5 Performances of Different Ground Conditions

The experiments were operated to evaluate the system performance under different ground conditions (smooth board and rough carpet as shown in Figure 34). Each run of experiments was operated under the same circumstance (numbers and positions of toy cars, start point of the robot). As shown in Table 7, under the smooth ground condition, the repeatability of the system 80% (4 success runs in 5 total runs). Except for

43 a longer total time caused by the friction and detection speed, the system implemented the designed purpose on both ground conditions most time.

Table 7: The Repeatability and The Performance for Different Light Conditions

Total Success Run Ground Toy car Detected Placed Accuracy Time Rate # Condition # # # (%) (s) (%) 1 Smooth 313 2 2 2 100.0% 100.0% 2 Smooth 274 2 2 2 100.0% 100.0% 3 Smooth - 2 2 1 100.0% 50.0% 4 Smooth 207 2 2 2 100.0% 100.0% 5 Smooth 213 2 2 2 100.0% 100.0% 6 Rough 320 2 2 2 100.0% 100.0%

Figure 34: The Photographs of Different Ground Conditions

44

4.2.6 Performances with Confusions

Five objects (Figure 35) were prepared to check the detection performance of the system. The resultant confusion matrix is shown in Table 8. Due to the insufficient number and diversity of samples in the custom dataset, stuff with similar shapes as toy cars may be detected by mistake, which may cause the failure of the process in some conditions.

Figure 35: The Confusion Objects for the Experiment

Table 8: The Result of Toy Car Detection with Confusions

Object Predicted

1. Tape Not a toy car 2. Toy Car Toy car 3. Key Ring Not a toy car 4. Decoration Toy car 5. Toy Car Toy car

Accuracy 80%

45

CHAPTER 5. DISCUSSION AND CONCLUSION

In this thesis, we implemented the detection and placement of toy cars with iRobot Create 2 on ROS. The intended functions were fulfilled in most runs of experiments and under most conditions.

Since the odometer of iRobot Create 2 is not accurate, using AMCL for localization still requires an enormous computational power and causes poor performance of the local planner using the dynamic window approach. A stable and simple algorithm, the follow-the-carrot algorithm, was applied to follow the global path. Additionally, the target objects near the wall will cause the failure of navigation.

Due to the insufficient number and diversity of the prepared custom dataset, toy cars are not able to be detected when they are very close to the robot. Stuff with similar shapes as toy cars may also be detected by mistake, which may cause the failures in some conditions.

The linear velocity was set to 0.18 m/s for stable detection and navigation. Higher velocity limitations were tested, which shows YOLO can real-time detect objects at a higher speed. However, a higher velocity caused the instability of an assembly with a high center of gravity and navigation.

Due to the mechanical limitation of the dock, to keep a toy car inside a dock, the robot must move forward to push rather than moving backward. Therefore, algorithms that allow the robot to move backward is not suitable for this robot, which reduces the efficiency. A robotic arm should be introduced to grip the toy cars, instead of simply pushing to solve mentioned issues.

46

Moreover, it is not realistic that only toy cars presenting on the ground. Thus, in real-world application, classification beyond detection is required. More types of objects should be trained in the YOLO network so that the robot can classify different objects and put these objects to different goals.

47

REFERENCES

[1] Y. Xu, H. Qian and X. Wu, Household Service , 1st ed., Elsevier Science & Technology, 2014, p. 3.

[2] T. Haidegger, M. Barreto, P. Gonçalves, M. K. Habib, S. Ragavan, V. Kumar, H. Li, A. Vaccarella, R. Perrone and E. Prestes, "Applied Ontologies and Standards for Service Robots," Robotics and Autonomous Systems, vol. 61, no. 11, pp. 1215- 1223, 2013.

[3] I. F. o. Robotics, M. Hägele, L. Schuhmacher, K. Röhricht and T. Zimmermann, "Introduction into Service Robots," International Federation of Robotics, 2015.

[4] T. Asafa, T. Afonja, E. Olaniyan and H. Alade, "Development of a Robot," Alexandria Engineering Journal, vol. 57, no. 4, pp. 2911-2920, 2018.

[5] Open Robotics, "ROS Introduction," [Online]. Available: http://wiki.ros.org/ROS/Introduction.

[6] M. Quigley, B. Gerkey and W. D. Smart, Programming Robots with ROS: A Practical Introduction to the Robot Operating System, O'Reilly Media, Inc, 2015.

[7] Open Robotics, "ROS Concepts," 21 June 2014. [Online]. Available: http://wiki.ros.org/ROS/Concepts.

[8] T. Shah, "ROS Basics 1 - Nodes, Topics, Services & Actions," 1 April 2017. [Online]. Available: https://tarangshah.com/blog/2017-04-01/ros-basics-1-nodes- topics-services-actions/.

[9] S. Thrun, Probabilistic Robotics, Cambridge, Massachusetts: The MIT Press, 2006, p. 245.

[10] M. Montemerlo, S. Thrun, D. Koller and B. Wegbreit, "FastSLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem," in AAAI-02 Proceedings, 2002.

[11] K. P. Murphy, "Bayesian Map Learning in Dynamic Environments," in Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, 1999.

[12] Wikipedia, "Bayes' Theorem," 19 April 2021. [Online]. Available: https://en.wikipedia.org/wiki/Bayes%27_theorem.

48

[13] "Principle of Gmapping," [Online]. Available: https://zhuanlan.zhihu.com/p/262287388.

[14] G. Grisetti, C. Stachniss and W. Burgard, "Improved Techniques for Grid Mapping with Improved Techniques for Grid Mapping," in IEEE Transactions on Robotics, 2007.

[15] X. Zhang, J. Lai, D. Xu, H. Li and M. Fu, "2D Lidar-Based SLAM and Path Planning for Indoor Rescue Using Mobile Robots," Journal of Advanced Transportation, 2020.

[16] E. Dijkstra, "A Note on Two Problems in Connexion with Graphs," Numerische Mathematik, vol. 1, pp. 269-271, 1959.

[17] "The Dynamic Window Approach to Collision Avoidance," IEEE Robotics & Magazine, vol. 4, no. 1, pp. 23-33, 1997.

[18] Open Robotics, "Dwa_local_planner Package Summary," [Online]. Available: http://wiki.ros.org/dwa_local_planner.

[19] M. Felix, "Asr_ftc_local_planner Package Summary," [Online]. Available: http://wiki.ros.org/asr_ftc_local_planner.

[20] R. D. Schaetzen, "Configuring the ROS Navigation Stack on a new robot," 12 March 2019. [Online]. Available: https://blog.zhaw.ch/icclab/configuring-the-ros- navigation-stack-on-a-new- robot/#:~:text=A%20costmap%20is%20a%20grid,the%20robot%20and%20an%2 0obstacle.

[21] Open Robotics, "Costmap_2d Package Summary," 10 January 2018. [Online]. Available: http://wiki.ros.org/costmap_2d.

[22] I. Goodfellow, Deep Learning, The MIT Press, 2016.

[23] L. Yann, B. Yoshua and H. Geoffery, "Deep Learning," Nature, vol. 521, no. 7553, pp. 436-444, 2015.

[24] Wikipedia, "TensorFlow," [Online]. Available: https://en.wikipedia.org/wiki/TensorFlow.

[25] Wikipedia, "PyTorch," [Online]. Available: https://en.wikipedia.org/wiki/PyTorch.

49

[26] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Vision and Pattern Recognition, Las Vegas, NV, 2016.

[27] J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, 2017.

[28] J. Redmon and A. Farhadi, "YOLOv3: An Incremental Improvement," 2018.

[29] A. Bochkovskiy, . C.-Y. Wang and H.-Y. M. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," 2020.

[30] G. Jocher, "YOLOv5," [Online]. Available: https://github.com/ultralytics/yolov5.

[31] J. Nelson and J. Solawetz, "YOLOv5 is Here: State-of-the-Art Object Detection at 140 FPS," 10 June 2020. [Online]. Available: https://blog.roboflow.com/yolov5-is- here/.

[32] SLAMTEC, "Specifications of RPLidar A1M8," [Online]. Available: http://www.slamtec.com/en/Lidar/A1Spec.

[33] J. Perron, “ROS Driver for iRobot Create 1 and 2,” 2020. [Online]. Available: https://github.com/AutonomyLab/create_robot.

[34] Slamtec, “ROS Driver for RPLidar,” 2019. [Online]. Available: https://github.com/Slamtec/rplidar_ros.

[35] Wikipeida, "Precision and Recall," [Online]. Available: https://en.wikipedia.org/wiki/Precision_and_recall.

[36] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid and S. Savarese, "Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.

[37] D. Kobran and D. Banys, "Epochs, Batch Size, & Iterations," 2020. [Online]. Available: https://docs.paperspace.com/machine-learning/wiki/epoch.

[38] Matlab, "What Is Camera Calibration?," [Online]. Available: https://www.mathworks.com/help/vision/ug/camera-calibration.html.

[39] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed., CambridgeUK: Cambridge University Press, 2003.

50

[40] OpenCV, "Camera Calibration," [Online]. Available: https://opencv-python- tutroals.readthedocs.io/en/latest/py_tutorials/py_calib3d/py_calibration/py_calibrat ion.html.

[41] OpenCV, "Pose Estimation," [Online]. Available: https://docs.opencv.org/master/d7/d53/tutorial_py_pose.html.

[42] MEGVII, "Camera-Laser Calibration," [Online]. Available: https://zhuanlan.zhihu.com/p/137501892.

[43] Q. Zhang and R. Pless, "Extrinsic Calibration of a Camera and Laser Range Finder (Improves Camera Calibration)," 2004.

[44] Open Robotics, "Move_base Package Summary," 2020. [Online]. Available: http://wiki.ros.org/move_base.

[45] S. Bianco, R. Cadene, L. Celona and P. Napoletano, "Benchmark Analysis of Representative Deep Neural Network Architectures," IEEE Access, vol. 6, pp. 64270-64277, 2018.

[46] D. Jiang, "An Overview of YOLO," 2020. [Online]. Available: https://zhuanlan.zhihu.com/p/143747206.

[47] H. P. Moravec, "Sensor Fusion in Certainty Grids for Mobile Robots," AI Magzine, vol. 9, no. 2, pp. 61-74, 1988.

[48] R. Yamashita, M. Nishio, R. K. G. Do and K. Togashi, "Convolutional Neural Networks: an Overview and Application in Radiology," Insights into Imaging, vol. 9, no. 4, pp. 611-629, 2018.

[49] OpenCV, "Camera Calibration and 3D Reconstruction," [Online]. Available: https://docs.opencv.org/master/d9/d0c/group__calib3d.html.

51

APPENDIX A: INVOLVED LIBRARIES AND PACKAGES

1. Robot driver dependency: ros-melodic-libcreate or https://github.com/AutonomyLab/libcreate

2. Robot driver: https://github.com/AutonomyLab/create_robot

3. LiDAR driver: https://github.com/Slamtec/rplidar_ros

4. Slam stack: ros-melodic-gmapping or https://github.com/ros-planning/navigation/tree/melodic- devel

5. Navigation stack: ros-melodic-move-base or https://github.com/ros-perception/slam_gmapping

6. YOLOv5: https://github.com/ultralytics/yolov5

7. Packages for the program: https://drive.google.com/file/d/1LVQlXIBCLRlqx1VRqwhliIIiy5DJHHWW/view?usp =sharing or https://github.com/gloria-201/create_toycar after installing necessary packages

52

APPENDIX B: PROCEDURES OF DRIVING THE ROBOT AND THE LIDAR

$ mkdir -p catkin_ws/src $ cd catkin_ws $ catkin init

# Clone the repository and install dependencies $ cd catkin_ws/src $ git clone http://github.com/jacobperron/create_robot.git $ git clone https://github.com/Slamtec/rplidar_ros.git $ cd catkin_ws $ sudo apt install ros-melodic-libcreate $ rosdep update $ rosdep install –from-paths src -i

# Build $ cd catkin_ws $ catkin built

# Access USB permissions with following commands, then logout and re-login to let permission take effect $ sudo usermod -a -G dialout $USERNAME

# Source the workspace $ source catkin_ws/devel/setup.bash

# Connect the robot and the LiDAR with the laptop using corresponding connection cables # Launch them with .launch files in new terminals, respectively

53