Survey Of Robust Multisensor Datafusion Techniques For LiDAR And Optical Sensors On Autonomous Navigation

Prasanna Kolar, Patrick Benavidez and Mohammed Jamshidi Department of Electrical and Computer Engineering The University of Texas at San Antonio One UTSA Circle, San Antonio, TX 78249, USA Email: [email protected], [email protected], [email protected]

Abstract—A system is as good as its sensors and in turn a using sensors for autonomous navigation of the and the sensor is only as good as the data it measures. Accurate, optimal main sensors that will be used for obstacle detection are the sensor data can be used in autonomous systems applications LiDAR and camera. As we will see in the upcoming sections, such as environment mapping, obstacle detection and avoidance, and similar applications. In order to obtain such accurate data these 2 sensors can complement each other and hence are we need to have optimal technology to read the sensor data, being used extensively in detection in autonomous systems. process the data, eliminate the noise and then utilize it. As part LiDAR and camera market is expected to reach USD 52.5 of this paper, we present a survey of the current data processing Billion by year 2032, as given in a recent survey by the techniques that implement data fusion using various sensors like Yole group, as given in a write-up by ”First Sensors” group. LiDAR, stereo/depth cameras and RGB monocular cameras and we mention the implementations or usage of this fused data in The fusion of these sensors is playing a significant role in tasks like obstacle detection and avoidance or localization etc. In perceiving the environment in many applications including future, we plan on implementing a state-of-art fusion system, on the autonomous domain. Reliable fusion is also critical in the an intelligent wheelchair, controlled by a human thought, without safety aspect of these technologies. Many challenges lie ahead intervention of any of the user’s motor skills. and it is one of the exciting problems in this industry. We propose an efficient and fast algorithm that senses and learns the projection from the camera space to the LiDAR space The Navigation of an autonomous system typically com- and outputs camera data in the form of LiDAR detection (dis- prises of three important components namely: tance and angle) and a multi-sensor and multi-modal detection a) Mapping system: which senses and understands the system that fuses both the camera and LiDAR detections to obtain environment the system is in. better accuracy and robustness. b) Localization system: that informs the robot its current Index Terms—Riemannian Geometry, Minimum Distance to Mean, Deep Learning, Convolutional Neural Networks, Sliding position at any given time. Window, Internet of Things, Smart City, Smart Community c) Obstacle avoidance system: that keeps the vehicle from running into obstacles and keeping the vehicle in a safe 1. INTRODUCTION zone. Autonomous systems play a vital role in our daily life, in The navigation system in also responsible for decision making a wide range of applications like driverless cars, humanoid capability of the robot when it faces situations that demand , assistive systems, domestic systems, military systems negotiating with humans and/or other robots. This research and manipulator systems, to name a few. Assistive focuses on the obstacle avoidance module of autonomous systems are a crucial area of autonomous systems that help navigation. people in need of medical, mobility, domestic, physical and Efficient mapping is a critical process that handles accurate mental assistance, that is gaining popularity in domestic ap- localization and driving decision making in the autonomous plications like Autonomous wheelchair systems [1], [2], au- system. The mechanical system chosen to present the numer- tonomous walkers [3], lawn movers [4], [5], vacuum cleaners ical scheme has several features. This system will be an intel- [6], intelligent canes [7], surveillance systems. The present ligent power wheelchair, that is capable of semi autonomous research focuses on development of state of art techniques driving in a known environment. This system utilizes a range using LiDAR and Camera, which will in turn be used as part of sensors such as a Cameras, Light Imaging Detection and of an object detection and obstacle avoidance mechanism in Ranging(LiDAR), ultrasound sensors, Navigation sensors etc. an intelligent wheelchair, which will be used by persons with Each sensor has its own preference for usage. We choose to use limited or no motor skills. In this survey, we concentrate on LiDARs as they are well known for their high speed sensing and are used for long range sensing and also for long range * ACE Lab, Department of Electrical Engineering mapping, while depth cameras and stereo cameras can be used for short range mapping and also used to efficiently detect obstacles, pedestrians [8] etc. Obstacle avoidance during navigation is a critical compo- nent of autonomous systems. Autonomous vehicles must be able to navigate their environment safely. While Path planning requires the vehicle to go in the direction nearest to the goal, and generally the map of the area is known, obstacle avoidance entails selection of the best direction within several unobstructed directions in real time. As mentioned earlier, this publication is limited to perform- ing research in developing a state of art framework using a LiDAR and Depth camera to provide a robust data fusion to be used for object detection, object identification and avoidance as required. But why do we need multiple sensors would Fig. 1: Concepts of perception be the question here and the answer is that every sensor used provides a different type of information in the selected environment, which include the tracked object, avoided object, There have been many attempts at reducing or removing the the autonomous vehicle itself, the world its being used and noise. For instance in object detection [11], background noise so on and so forth, and the information is provided with removal [12]. In this section we discuss filtering noise using differing accuracy and differing details. Due to the usage of Kalman Filter. Kalman filter is over 5 decades old and is one multiple sensors, an optimal technique for fusing the data to of the most sought after filtering techniques. We will discuss obtain the best information at the time is of essence. Multiple 2 flavors of Kalman filter, namely: Extended Kalman Filter sensor fusion has been a topic of research for several decades; and Unscented Kalman Filter. Different types of sensor data there is a dire need to optimally combine information from fusion technologies have been compared in this section. different views of the environment to get an accurate model 1) Decision or Highlevel fusion of the environment at hand. The most common process is 2) Feature or midlevel fusion to combine redundant and complementary measurements of 3) rawdata or low-level fusion the environment. The required information by the intelligent system cannot be obtained by a single sensor due to its Decision or Highlevel fusion: At the highest level, the system limitations and uncertainty. decides the major tasks and takes decisions based on the fusion The tasks of a navigation system namely mapping, local- of information, that is input from the system features ization, object detection and tracking can also be interpreted Feature or midlevel fusion: At the feature level, feature maps as the following process(es). containing lines, corners, edges, textures, lines are integrated • Mapping : A process of establishing spatial relationship and decisions made for tasks like obstacke detection, object among stationary objects in an environment recognition etc. • Localization : A process of establishing spatial relation- Rawdata or low-level fusion: At this most basic or lowest ship between the intelligent system and the stationary level, better or improved data is obtained by integrating raw objects data directly from multiple sensors, such data they can be • Object detection : A process of identifying objects that used in tasks. This new combined raw data will contain more are present in the environment information than the individual sensor data. • Mobile object tracking : A process of establishing tempo- We have not performed an exhaustive review of the data ral and spatial relationship between the intelligent system, fusion since there has been extensive research in this area and the mobile objects in the environment and the stationary have summarized the most common data fusion techniques objects. and the involved steps. These tasks vary in several ways and therefore a single sensor This paper is organized as follows: will not be able to provide all the information that is necessary This Section: 1 gives a brief introduction of datafusion and to optimally perform the above mentioned tasks. Hence we . Section: 2 details the accomplishments in have the need to use multiple sensors that may be redundant the area of perception, benefits of datafusion and usefulness but are complementary and are able to provide the information of multiple sensors. Section: 3 gives details on the design of to the perception module in the intelligent system. Hence the the system while section: 4 discusses some sample datafusion perception module uses information from sensors like LiDAR, techniques. Section: 6 Details the sensor noise and gives camera, ultrasonic etc. We will detail these sensors and the details of noise filtering using techniques such as Kalman Fil- above mentioned tasks in the following sections. Combining tering while Section: 7 gives the architecture and framework information from several sensors is a current challenging of the proposed system while discussing some of the previous problem and state of the art solution [9], [10]. Every sensor methods. Section: 8-B describes the methodology. Section: has an amount of noise that is inherent to its properties. 9 discusses the hardware. Section: 10 details the detection

Page 2 algorithm for object detection that could be used in the data They assume that when a robot receives a sensor scan, it fusion. Section: 12 gives the conclusions and future plans. is not likely that an obstacle is perceived in future measure- ments when it scans space previously perceived as free. The 2. PREVIOUS WORK likelihood is inversely proportional to the distance between Robot navigation is being extensively studied in the commu- previous and current measurements. nity for the past several decades. As mentioned in Section 1, sˆt = arg max P (st|ot, at1, sˆt1) (2-.3) we broadly divide robot navigation into Mapping, Localization st and Obstacle avoidance. There are various mapping techniques the results are determined using gradient ascent algorithm. of which three stand out; they are: topological, metric or The result of the search, sˆt and its corresponding scan ot are hybrid. Topological mapping is based on connections while appended to the map. metric maps are distance based. Feature/landmark based and Obstacle detection, identification and Avoidance is a funda- dense surface information based are types of metric maps. mental and extensively studied topic in the area of autonomous While landmark needs feature identification or designing the systems. Any autonomous system, or Autonomous navigation environment, dense technique is based entirely on the sensors function based system, must be aware of the presence of to create the map. These sensors create a geometric representa- obstacles. When such a system deals with human assistance, tion of the environment surfaces [13]–[16]. New techniques in the obstacle problem becomes even more critical, since there the area of mapping and localization have been developed over is zero tolerance for failure. Objects are detected, identified the last few decade. Many of these techniques incrementally and deemed as obstacles by the system. The obstacles can and iteratively build maps and localize the robot, for every either be static or mobile. If its a static obstacle, the problem new sensor data scan that the robot accepts [13], [15]. The reduces to detection of present position and avoidance. If the drawbacks of these techniques are their failure when large obstacle is mobile, an autonomous system should not only cyclical scan(open-loop) environments are involved, even if know where the obstacle currently is, but also track where they are typically fast. Environments that are cyclical, will the obstacle could be in near future. This reason prompts us output cumulative errors that can grow exponentially and to perceive the obstacles as dynamic entities and the task of without any bounds. This is because in environments with obstacle avoidance a complex one. cycles, if this error has to be corrected backwards in time, Many different approaches exist to solving the obstacle it is time consuming and several systems may not be able to avoidance problem, some commonly used approaches are the achieve acceptable results. S. Thrun et al. (2000-02), presented Vector Field Histogram (VFH) [21], the Dynamic Window a novel algorithm which is strictly incremental in its approach Approach [22] and occupancy grid algorithm [23], [24], the [17], [18]. The basic idea is to combine posterior estimation Potential field method [25]. with incremental map construction using maximum likelihood In order to operate efficiently, the autonomous vehicle needs estimators [19], [20]. This resulted in an algorithm that can accurate data from each of its sensors. The reliability of build large maps in cyclical environments in realtime, even on operation of an autonomous vehicle is hence proportional to a low footprint computer like a micro-computer e.g. Odroid the accuracy and hence the quality of the associated sensors. XU4. The posterior estimation approach enables robots to Each type of sensor has its own limitations, given by the below localize themselves globally in maps developed by other linked list: robots and thus making it possible to fuse data collected by • Lidar : Weather phenomena as in rain, snow, fog [26] more than one robot at a time. They extended their work • Stereo camera : Distance from target, Baseline [27] to generate 3D maps, where multi-resolution algorithms are • Ultrasound : Pollutants [28] utilized to generate low complexity 3D models of indoor environments Due to the limitations in each of these sensors when used individually, in order to get acceptable results, one tends to mt = {hOτ , sˆτ i}where τ = 0, 1, 2, 3..t (2-.1) utilize a suite of different sensors and utilize the benefits of each of them. The diversity offered by this suite of sensors where : Oτ : a laser scan contributes positively to the sensed data perception [29], [30]. sˆτ : laser scan’s pose Sensor data fusion is effective whenever multiple sen- τ : time index sors(homogeneous or heterogeneous) are utilized. It is not just arg max P (m|dt) (2-.2) limited to the field of robotics [31] and in fact surveillance x [32], gesture recognition [33], smart canes [7], guiding glasses where data dt is a sequence of LiDAR measurements and [34] use this concept efficiently. odometry readings The effective temporal, spatial and geometrical alignment of dt = {s0, a0, s1, a1, ..st, at} this suite of heterogeneous sensors and the diversity utilization where sτ denotes an observation (laser range scan), aτ denotes is termed as sensor data fusion [29], [30]. an odometry reading, and t and τ are time indexes. It is Depth perception cameras provide limited depth information assumed that observations and odometry readings alternate in addition to data rich image data. Although cameras have the each other. advantage of providing extremely rich data almost equivalent

Page 3 to the human eye, they need significantly complex machine 4) Travel to unknown objects e.g. go to the cafeteria vision techniques which require high computing power. In in an unknown area addition to his challenge, the operational limitation can be 5) Travel to implied locations e.g.: go to dining table attributed to adequate lighting and visibility. Cameras are if user thinks of food used very efficiently in detecting sign recognition, pedestrian • Unmapped navigation detection, lane departure, identification of objects, headlamp 1) Travel by following static or moving objects control. Cameras are much cheaper compared to radars or 2) Travel using using metric-navigation commands lidars [35]. Hence the community prefers them over other 3) Travel by speed variations sensors in certain applications. As mentioned previously, the 3 main tasks involved in au- Both Lidars and Depth Cameras contain depth sensing tonomous navigation of intelligent mobile systems are as sensors. While the cameras estimate the depth information follows: using disparity information in the image, the lidar generate 1) Mapping [18], [40] depth information from the environment. Each sensor has 2) Localization [40] its pros and cons. The depth cameras provide rich depth 3) Obstacle Avoidance [21], [41] information but their field of view is quite narrow. In contrast, the Lidars contain excellent field of view but do not provide Each area is detailed in the subsections below. rich environment information and instead provide sparse in- A. Mapping formation [31], [34], [36]. The lidar provides information in Mapping for autonomous mobile vehicles like smartwheels form of point cloud while the camera gives luminance We can is a discipline related to computer vision [18], [40] and see that these sensors can complement each other and can be cartography [42]. One of the goals for the wheelchair could used in complex applications. This is the advantage that we be to develop a map of the environment using the onboard focus on in this study. sensors and the other goal for the smart wheelchair could be, Huber et. al studied Lidar and camera integration [37] and to utilize the constructed map of the environment. found that the sparse information in the Lidar may not be Constructing a map can be exploratory [43], without the useful for complex applications and that a data fusion with use of any pre-existing mapping information or by utilizing a sensor that has rich information is useful. They found that an existing floor plan, that details the presence of walls, floor, stereo performs poorly on areas without texture and on scenes walls, ceiling etc. Using the techniques of exploratory [43] containing repetitive structures, and the subsequent fusion with navigation, the wheelchair can develop the map and continue LIDAR leads to a degraded estimation of the 3D structure. to navigate. If the floor plan is available, the wheelchair They proved that fusing the LiDAR data directly into the depth can create the map by traversing along the building floor camera reduces false positives and increases the disparity map and localize itself. In order to map the environment, image density in the textureless surface. a lidar is used which provides a 3 Dimensional pointcloud Trajectory optimization is another area of autonomous sys- of the environment where the robot is situated. Hence we tem that has made headway in the recent times, Balkcom can define a Robotic mapping as that branch of robotics that and Mason [38] and Kolmanovsky [39] worked with non- deals with the study and application of the ability of the robot holonomic systems such as differential drive robotic device- construct the map or floor plan by the autonomous robot, of sand produced benchmark systems. Trajectory planning is out the environment where it is situated, using its sensors. An area of scope as part of this publication. of mapping that deals with active mapping of the robot in its 3. DESIGN environment while simultaneously localizing itself is called Simultaneous Localization and Mapping (SLAM) [44]–[48]. Robot navigation is studied in this survey. Availability of There are various flavors of SLAM. However, SLAM is out new age sensors, advanced computing hardware, advanced of scope of this survey. algorithms for processing and fusion have made an extremely complex task of information fusion possible. Decision making B. Localization relies on Data fusion which comprises of combining inputs Localization is one of the most fundamental competencies from various sources in order to get a more accurate combined required by any autonomous system, especially an autonomous sensor data as output. At the primary, navigation of the vehicle, as the knowledge of the vehicle’s own location is an wheelchair comprises of accepting the command from the user essential precursor to take any decisions about future actions, to safely go to a target location. The user could command the whether planned or unplanned. In a typical localization sit- wheelchair to use navigation based on mapped navigation or uation, a map of the environment or world is available and unmapped navigation, some of which are listed below: the robot is equipped with sensors that sense and observe • Mapped navigation the environment as well as monitor the robots motion [49]– 1) Travel to named locations e.g.: go to kitchen [51]. Hence localization can be termed as that branch in an 2) Travel to known objects e.g. go to the sofa autonomous system navigation, which deals with the study and 3) Travel to unknown locations e.g. go to restroom in application of the ability of a robot to localize itself in a map an unknown office location or plan.

Page 4 There are several methods that could implement localization to assist receiving systems localize themselves. The receivers, a) Dead Reckoning: Dead reckoning uses odometry data, which are tiny photo sensors that detect the flashes and the trignometric and robotic kinematic algorithms to determine laser light are placed on various locations on the vehicle, in the distance travelled by the robot from its initial position. this case the wheelchair. When a flash initiates, the receiver However, there are 2 major issues that impact its perfor- starts counting until it detects the photo sensor situated on it mance. The robot has to know the initial position and the gets hit by a laser beam and uses the relationship between second are the time measurement related errors, which impact where that photo sensor exists on the wheelchair, and when the accuracy; which sometimes go below acceptable levels. the beam hit the photo sensor, to mathematically calculate its Sebastian Thrun et al [17], used a probabilistic method to exact position relative to the base stations in the room. When reduce the errors, known as particle filtering. Others used we have detection by enough of the photo sensors with a laser Extended Kalman Filter [52] and similar techniques to reduce at the same time, they form a pose that provides the position the errors. Researchers utilized sensors like IMU to perform and the direction of the wheelchair. This is called an inside- dead-reckoning [53], [54], while others used ultrasonic sensors out tracking system since the headset uses external signals to with Kalman filters to improve the measurements [13] figure out where it is. b) Signal-based localization: Sensors that communicate via signals are several [55], of which Radio Frequency Iden- C. Obstacle Avoidance tification(RFID) [56], [57], WiFi [58] and Bluetooth [59] are For successful navigation of an autonomous system, avoid- a few. In this technique, the positions of a network of nodes ing obstacles while navigating is an absolute requirement. The are identified based on distance estimates‘ between them. vehicles must be able to navigate their environment safely. c) Global Positioning: Outdoor navigation is involved in Path planning requires one to go in the direction closest to cases of outdoor search and rescue missions. Localization in the goal, and generally the map of the area is already known. such cases involves usage of Global Positioning Systems(GPS) On the other hand, obstacle avoidance involves choosing the that efficiently work only outdoors. GPS technology was first best direction among multiple non-obstructed directions, in developed by NAVSTAR [60] and is one of the favorite real time, hence obstacle avoidance can be considered to be technologies to date for outdoor navigation. Some of the GPS more challenging than path planning. companies are Navstar, Garmin, Tom Tom, mobius, etc. to Obstacles can be of two types (i) Immobile Obstacles (ii) name a few. GPS provides very accurate (normally range upto Mobile Obstacles. static object detection deals with localizing to 1 meter), some advanced GPS provide accuracy upto 2 objects that immobile in the environment eg. a table, large centimeter like the Mobius agricuture mapping system [61], bed etc, while moving object detection deals with localizing which is used on the autonomous tractors developed by Case the dynamic objects through different data frames obtained by New Holland Industrial. the sensors in order to estimate their future state eg pets at d) Network of sensors localization: A sensor network home, persons moving, small tables, chairs etc. The object’s comprises of several sensors that can communicate either state has to be updated at each time instance. Moving object wirelessly or wired. Choi et. al. combined RFID tags with localization is not a simple task even with precise localization external camera to monitor the robot [31]. In some cases information. The challenge increases when the environment is ceiling mounted cameras were used to improve localization cluttered with obstacles. The obstacles can be detected using 2 when odometry data was fused with LiDAR [62]. The camera approaches that rely on prior mapped knowledge of the targets was used to locate obstacles and also to aid in the initial or the environments [9], [67], [68]. These are the (i)Feature position estimation. based approach that use LiDAR and detect the dynamic e) Vision based localization: Sensors mounted on the features of the objects. (ii)Appearance based approach that robot provide the latest and accurate data with respect to use cameras and detect moving objects or temporally static the robot. This system of sensors can be generalized to objects. different environments and robots that use them and hence Recent research has produced two fundamental paradigms are sought after in the present research areas. Outdoor for modeling indoor robot environments: the grid-based environment can be supported by a single of multiple paradigm and the topological paradigm. Grid-based ap- sets of GPS and are fairly accurate. Indoor environments proaches, [21], [69], [70] represent the robot environments as use LiDAR sensors [63] and/or vision based sensors [64], [65]. evenly-spaced grids. Each grid cell may contain representation of an obstacle, or a free path to the target as applicable. f) Indoor VR localization: : Indoor localization using the Topological approaches [71]–[73] represent robot environ- new age technologies like Virtual Reality head-sets, 3D laser ments as graphs. The nodes represent situations, areas, or sensors is on the rise. One such example is the HTC ViVe objects(landmarks) (such as doorways, windows, signboards). [66] Lighthouse technology. This system floods a room with The nodes are interconnected by arcs if the 2 nodes have a light invisible to the naked eye, Lighthouse functions as a direct path between them. Both these robot mapping have reference point for any positional tracking device (like a VR demonstrated orthogonal strengths and weaknesses. Occu- headset or a game controller) to figure out where it is in real pancy grids are easy to construct and maintain in large-scale 3D space. The lighthouse system shoots light into the world environments [19], [74] establish different areas based on the

Page 5 robot’s geometric position within a global coordinate frame. for instance when a predator is around, it prepares itself and The position of the robot is incrementally estimated using the takes decisions regarding the current and future actions [75]. odometric information and sensor readings taken by itself. Over the years, scientists and engineers have applied concepts Thus, the number of sensors readings are unbounded are of such fusion into technical areas and have developed new utilized here to determine the robot’s location. disciplines and technologies, that span over several fields. Contrary to this, topological approaches determine the po- Waltz’s book ”Multisensor Data Fusion” [76] and Llinas and sition of the robot relative to the model primarily based on the Hall’s Mathematical Techniques in Multisensor Data Fusion” environment’s landmarks or distinct, temporal sensor features. [77] propose an extended term, ”multisensor data fusion”. eg: If the robot traverses two places that seem identical, These books define it as a technology concerned with how topological approaches often have difficulty determining if to combine data from multiple (and possible diverse) sensors these places are the same or not especially if these places in order to make inferences about a physical event, activity, have been approached through different paths. Also, since or situation. The International Society of Information Fusion sensory input usually depends strongly on the robot viewpoint, defines information fusion as [78] if its sensory input is ambiguous, topological approaches may ”Information Fusion encompasses theory, techniques fail to recognize geometrically nearby places even in static and tools conceived and employed for exploiting the environments, making it difficult to construct large-scale maps. synergy in the information acquired from multiple Contrary to this, grid-based approaches are hampered by their sources (sensor, databases, information gathered by enormous space and time complexity. This is because the human, etc.) such that the resulting decision or resolution of a grid must be fine enough to capture the details action is in some sense better (qualitatively or of the robot world. This limitation is reduced in topological quantitatively, in terms of accuracy, robustness, etc.) approached by their compactness. The resolution of topo- than would be possible if any of these sources were logical maps correspond directly to the complexity of the used individually without such synergy exploitation.” environment. The compactness of topological representations A subset of information fusion, the term sensor fusion is gives them three key advantages over grid-based approaches: introduced by Elmenreich [79] as: (i) fast planning, (ii) interfacing to symbolic planners and Sensor Fusion is the combining of sensory data problem-solvers, and (iii) natural interfaces for human speech ” or data derived from sensory data such that the like instructions (such as: go to kitchen). Topological maps resulting information is in some sense better than recover early from slippage and drift since they do not require would be possible when these sources were used the exact determination of the geometric position of the robot individually which must be constantly be monitored and compensated as ”. in a grid-based approach. There are homogeneous sensor data and heterogeneous sensor data. Heterogeneous sensor data comprises of different 4. DATA FUSION TECHNIQUES types of sensing equipment, like optical, auditory, EEG etc. Sensing is one of the most abundantly available techniques The study was performed on LiDAR and stereo camera. for existence in nature. For example in the animal kingdom, Systems with multi sensor fusion are capable of providing this can be seen as a seamless integration of data from various many benefits when compared with single sensor systems. sources, some overlapping and some non-overlapping to output This is because all sensors suffer from some form of limi- information which is reliable and feature-rich that can be used tation, which could lead to the overall malfunction or limited in fulfilling goals. In nature, this capability essential for sur- functionality in the control system where it is incorporated. vival; be it communication, understanding or even existence. Following are some of the limitations of single sensor unit Many times, some sensory systems may not be available and systems: even in such cases, the systems are able to re-use information 1) Sensor Deprivation: If a sensor breaks down, the sys- obtained from sensory systems with some form of overlapped tem where it was incorporated in, will have a loss of usage. As an example in wildlife, consider Bears [75] and perception. compare their sensory capabilities; they have sharp color close- 2) Uncertainty: Inaccuracies arise when features are miss- up vision, but do not have a good long distant vision. However, ing, due to ambiguities or when all required aspects their hearing is excellent, because they have the capability to cannot be measured hear in all directions. Their sense of smell is extremely good. 3) Imprecision: The sensor measurements are limited to the They use their paws very dexterously to manipulate a wide precision implemented sensor. ranging objects, from picking little blueberries to lifting huge 4) Limited temporal coverage: There are initialization/ rocks. Often bears touch objects with their lips, noses, and setup time to reach a sensors maximum performance and tongue to feel them. Hence we can surmise that their sense of transmit a measurement, hence limiting the maximum touch is very good. Surely they combine signals from the five measurements frequency. body senses i.e., sound, sight, smell, taste, and touch) with 5) Limited spatial coverage: Normally, an individual sensor information of the environment they are in, and create and will cover only a limited region of the entire environ- maintain a dynamic model of the world. At the time of need, ment. For ex. a reading from an ambient thermometer

Page 6 on a drone provides an estimation of the temperature severe motor disabilities in order to handle their navigational near the thermometer and may fail to correctly render requirements and hence pose significant challenges for de- the average temperature in the entire environment. cision making due to the safety, efficiently and accuracy The problems stated above can be mitigated by using a requirements. suite of sensors, either homogeneous or heterogeneous [80], For reliable operation, decisions on the system need to be [81]. Following are some of the advantages of using multiple made by considering the entire set of multi-modal sensor sensors or sensor suite: data they acquire, keeping in mind a complete solution. In addition to this, the decisions need to be made considering 1) Extended Spatial Coverage: Multiple sensors can mea- the uncertainties associated with both the data acquisition sure across a wider range of space; and sense where a methods, and the implemented pre-processing algorithms. Our single sensor cannot focus in this paper is to survey the data fusion techniques and 2) Extended Temporal Coverage: Time based coverage discuss the development of more robust approaches for data increases while using multiple sensors fusion, which considers the uncertainty in the fusion algorithm. 3) Improved resolution: A union of multiple independent Before using any sensor, it needs to be calibrated. This measurements of the same property, the resolution is sensor suite follows the same path and the entire suite needs better, ie more than that of a single sensor measurement. to be calibrated first. This will ensure a levelled measurement, 4) Reduced Uncertainty: As a whole when we consider the i.e., where all the sensors can be fused uniformly. Both forms entire sensor suite, the uncertainty decreases, since the of calibration i.e, extrinsic and intrinsic were surveyed. We combined information reduces the set of unambiguous found that extrinsic calibration methods are not optimum if interpretations of the sensed value. there are multiple autonomous units working together e.g.: 5) Increased robustness against interference: An increase in in a old age home where the systems must work together to the dimensionality of the sensor space (measuring using share information about location, situation awareness etc; this a LiDAR and stereo cameras), the system becomes less could be attributed to the variations that exist between sensors vulnerable against interference. due to manufacturing differences, different sensors mounted 6) Increased robustness: The redundancy that is provided and different autonomous system types. In such an example, due to the multiple sensors provides increased robust- the calibration duration will be very high if the number of ness, even when there is partial failure due to one of the autonomous systems is high; in fact it could be exponential and sensors being down. hence exorbitant and unacceptable. Reducing the calibration 7) Increased reliability: Due to the increased robustness, process and the intensity of calibration is essential and that is the system becomes more reliable also one of the topics we intend to explore in this research. 8) Increased confidence: When the same domain or prop- erty is measured by multiple sensors, one sensor can 5. CLASSIFYING TECHNIQUESAND METHODSOF DATA confirm the accuracy of other sensors; this can be FUSION attributed to re-verification Classification of data fusion is fuzzy and fluid, in that N.S Rao et al [82] provide metrics comparing the difference(s) it is quite tedious and complex to follow and adhere to between single sensor and multi-sensors. They state that if strict processes and methodologies. There are many criteria the distribution function depicting measurement errors of one that can be used for classification of data fusion. Castanedo sensor is precisely known, an optimal fusion process can be discussed [83] the techniques and algorithms for state esti- developed and this fusion process performs similar to if not mation, data association and finally a higher level decision better than a single sensor. fusion. Dasarathys´ datafusion methods [84] discusses several One of the advantages due to sensor fusion is the reduction technique. Luo [29] discusses abstraction levels, JDL [85] does of system complexity; this is because the output of sensor basic research in datafusion. Some of these techniques are fusion is better; lesser uncertainty, less noisy, more complete. given below: Users can be re-assured that the fused data is better than that • Data type of input(data)-output(information): Several of single sensors. Since the sensing layer is better now, the types of classification emerged out of Dasarathys´ input- control application can be standardized independently. output data fusion [84]. They can be termed as: Data- Sensor data fusion using LiDAR and optical sensor data in:Data-out(DAI-DAO), wherein raw data is input and is studied in this survey, and discuss two of the fundamental raw data is extracted out. Data-in:Feature-out(DAI-FEO): issues surrounding sensor data fusion, which are the resolution raw data is sourced but the system provides features differences in the heterogeneous sensors and understanding extracted out of the data as output. Feature-in:Feature- and utilizing the heterogeneous sensor data streams while out(FEI-FEO): Features from previous steps of fusion or accounting for many uncertainties in the sensor data sources. from other processes are fed into the fusion system and Emphasis is placed on utilizing this fused information in the better features or higher level features are output. New navigation of autonomous wheelchairs. This is challenging and improved features are output as part of this type of since autonomous wheelchairs work in complex environments, fusion. This is also called as Feature-fusion [84] Feature- be it at home or work, which is to assist persons with in:Decision-out(FEI-DEO): The features fed into the in-

Page 7 object refinement layer, analysis of the situation is per- formed. Based on the data input and the present and past decisions, the situation assessment is performed. A set of high-level inferences is the outcome of this layer. Identification of events and activities are performed. Layer 3: the output of layer 2 ie., the significant activities and current events are assessed for impact on the system. Prediction of an outcome and threat analysis is performed at this layer. Layer 4: Overall processes from layer 0 through layer 3 are optimized and improved. Resource control and man- agement, task scheduling and prioritizing are performed Fig. 2: Comparison of data fusion techniques to make improvements. • Data source relationships: This type classification uses concepts of data redundancy, data complementing and put system as source are processed to provide decision data combination. for tasks and goals as output. This is where simple or Video data overlaps can be termed as redundant data highlevel features are accepted as input, processed and sources and can be optimized. This is the area of data- decisions are extracted for the system to follow. Most of source classification wherein the same destination or tar- the present day fusion is of this type of classification get is identified by multiple data sources. Complementary technique. Decision-in:Decision-out(DEI-DEO): Simple data sources provide different input that can be combined and lower level decisions are accepted by the system and to form a complete target or scene or object. For example higher level better decisions are processed out. This is a complete scene if formed using different cameras and type of fusion is also called as Decision-fusion [84] the scene can be put together from individual pieces. • Data abstraction level: In a typical perception system, Combining data sources in a cooperative environment, one comes across the following abstraction of data: pixel, gives an end result that is more complex that the input signal,symbols, feature-characteristics. [29] pixel level source information. classification is performed on image input from sensors • Data fusion based on system architecture: This like monocular, stereo-depth cameras, IR cameras etc to classification deals with the system architecture a system; image processing that is used to improve tasks and identifies where the data fusion is performed. that look for and extract objects, object features use this The architecture could be hierarchical, distributed technique. signal level classification is performed on data decentralized, centralized etc. In a de-centralized involving signal from sensors like LiDAR, sonar, audio, architecture, there is no single system that performs etc. The signal data is directly operated on and output ren- the data fusion. In fact multiple systems are identified dered. symbol level classification is a technique that em- to perform the data fusion. Each system processes ploys methods to represent information as symbols; This its own and its neighbors data. The advantages are is similar to the decision-fusion technique of Dasarathy faster processing since each system could be processing [84] and is termed as decision level. characteristic level smaller chunks of data. The cons of this process are the classification extracts features from signals or images high communication costs since several systems needs while processing the data and is termed as feature level. to communicate with each other and the cost is ω(n)2, • Data fusion levels as in JDL: Data fusion models divided at each step of communication and n is the number into five processing layers, interconnected by a data bus of nodes. The process is costliest if each node has to to a relationship database. [85], [86] communicate with everyone of its peers. Contrary to this, Layer 0: Processes source data comprised of pixel and in a centralized architecture, a powerful single system signal. Information is extracted, processed, reduced and will perform the data fusion. Suboptimal systems could output to higher layers. end up being resource hogs that take up a lot of resources Layer 1: Data output from layer 0 is processed here and in form of bandwidth, since raw data is transferred from refined. Typical processes are alignment in the spatial- the sensors to the central processing system. when higher temporal information, correlation, clustering, association number of sensors are used, this type of architecture will and grouping techniques, false-positive removal and re- pose huge resource issues. Moreover, the central unit duction, state estimation, image feature data combination would need to be very powerful to process and perform and state estimations. Classification and identification the data fusion, which could mean an expensive system. ; state and orientation are the typical outputs. It also performs input data transformation to obtain consistent Distributed systems: State estimation and data processing and robust data-structures. are performed locally and then communicated to the Layer 2: Based on other output of the layer 1 or the other systems. Single node to groups of system forms

Page 8 the range of processing in this architecture. The which is comprised of the previous system state xk−1, the fusion node processes the end result only after the control signal uk and the process noise in the previous iteration individual data processing at the local level is completed. wk−1. Equation 6-A.2 calculates the current measurement Hierarchical systems: A system architecture, wherein the value zk, which is a linear combination of the unknown higher level nodes control the lower level nodes and a variable and the measurement noise vk. A, B, H are matrices mechanism of hierarchical control of data fusion is setup that provide the weights of the corresponding component of is the hierarchical datafusion system. In this type of the equation. These values can be provided apriori and are architecture, a combination of distributed decentralized system dependent. A Gaussian distribution with a zero mean nodes could be employed to achieve datafusion. contributes wk−1 and vk these two noise values have covari- ance matrices Q and R respectively. Q and R are estimated 6. SENSOR DATA NOISE apriori, but can be a coarse estimate since the algorithm will In addition to the sensing information, every sensor is bound converge to the accurate estimators. to have a level of noise and while using these sensors, one will There are 2 steps that dominate the process: the time update soon realize that at least a small amount noise is bound to and the measurement update, wherein each step has a set of exist in addition to measurement and estimation uncertainties. equations that must be solved to calculate the present state. When such errors or uncertainties occur, it is required to use The following is the algorithm: techniques that mitigate their effects on the system. This now 1) Predict next state becomes a complex problem of estimating the state(s) of the system after the system becomes observable. Xk,k−1 = ΦXk−1,k−1 (6-A.3) Mathematical algorithms that accomplish this are the Filter- 2) Predict next state covariance ing techniques. Filtering techniques are applicable in several T domains like economics, science, engineering, Localization Sk,k−1 = ΦSk−1,k−1Φ + Q (6-A.4) systems can make use of these techniques as there is an innate 3) Obtain measurement(s) Yk level of sensor measurement noise and uncertainty with their 4) Calculate the Kalman gain (weights) pose estimation. Filtering techniques have been used in many T T −1 localization systems and two of the most popular filtering Kk = Sk,k−1M [MSk,k−1M + R] (6-A.5) algorithms are Kalman filters and particle filters. 5) Update state A. Kalman Filters Xk,k = Xk,k−1 + Kk(Yk − MXk,k−1) (6-A.6) Kalman filters were introduced by Rudolf Kalman in 1960 [87]. It is also known as Linear Quadratic Estimation(LQE) in 6) Update state covariance the field of controls and autonomous systems. Kalman filtering Sk,k = [I − KkM]Sk,k−1 (6-A.7) is an iterative algorithm that uses Bayesian inference to esti- mate the probabilistic distribution of the uncertain/unknown 7) Loop (now k becomes k + 1) variables. They use a series of measurements that have noise This filter’s output is the result of the state update and from measurements and process(es). This is because unknown state-covariance update equations. These provide the combined variables can be estimated better with multiple measurements estimate from the prediction model and measurements from than with a single measurement. The algorithm is optimized sensors. The mean value of the distribution for each state to run in real time and needs only the previous system state variable, is provided by state matrix and the variances by the and the current input measurement. covariance matrix. Kalman filters are extensively used in various fields in- J = Y − MX (6-A.8) cluding the area of autonomous systems, signal processing, k k,k−1 navigation, to name a few. Kalman filters iteratively estimate Then calculate the covariance of the innovation: an output and its associated uncertainty which is based on a COV(J ) = MS M T + R (6-A.9) sensor measurement and the uncertainty of that measurement, k,k−1 its previous output and the output uncertainty. The gain matrix can then be calculated as The calculations start as a linear model and continuously T −1 determines the parameters of this model. The logic must Kk = Sk,k−1M COV(J ) (6-A.10) ensure that the system stays as a stochastic linear model. and the updated state can be calculated as

Xk,k = Xk,k−1 + KkJ (6-A.11) xk = Axk−1 + Buk + wk−1 (6-A.1) z = Hx + v (6-A.2) We propose taking a set of measurements in present condi- k k k tions. The system initializes many matrices. The state variables The equations 6-A.1 and 6-A.2 are the typical steps in a X0,0 can be set based on the initial measurements from the Kalman filter algorithm where k denotes the current iteration. sensors. The covariance of the state can be initialized using Equation denotes the current estimate of a state variable xk, the identity matrix I or the covariance matrix Q. Initially the

Page 9 covariance matrix are not stable. But will stabilize as time progresses and the system runs. Measurement noise covariance R matrix is calculated using calibrations performed earlier. The measurement sensors will be developed to measure a large number of readings of the ground truth state, from which the variances can be calculated. 2 The variance of the measurements provides the value of σn in R. Using literal interpretation(s) from state transition, equations can be used to place the much needed bounds on dynamic Fig. 3: High level Perception Architecture noise. This is because, it will be harder to calculate the 2 dynamic noise covariance Q. For instance, 3 sigma in σa in Q can be calculated by interpreting the target acceleration as [90], [91]. This technique however has its drawbacks, which a constant velocity model with dynamic noise. are expensive computational process and complexity. Back The relative ratio of the measurement noise to the dynamic in 1993, this was an issue but nowadays, we can make use noise is the important factor. This helps calculate the gains. of CPU, GPU and similar high power computing to reduce In the Kalman Filter, its known to keep one of the noise the computational effort. One of the main deficiencies in a covariance matrices constant while adjusting the other contin- particle filter is that: Particle filters are insensitive to costs uously until the desired performance is achieved. The family that might arise from the approximate nature of the particle of Kalman Filters are to be used in systems that can be run representation. Their only criterion for generating a particle is continuously for better accuracy or performance and cannot be the posterior likelihood of a state. Due to this deficiency, we used for quick/ few iterations, since it takes several iterations will implement a Kalman Filter and not a Particle Filter. just to stabilize while using Kalman Filters. 7. FRAMEWORK B. Particle Filters A sensor is an electronic device that measures physical Particle filters were first introduced in 1993 [88], and have aspects of an environment and outputs machine(a digital continuously become a very popular class of numerical meth- computer) readable data. They provide a direct perception of ods for optimizing the solution of non-linear non-Gaussian the environment they are implemented in. Typically a suite of scenarios. Particle filters, like any member of the family of sensors is used, since it is the inherent property of an individual Bayes filters such as Kalman filters and HMMs, estimate the sensor; to provide a single aspect of an environment. This posterior distribution of the state of the dynamical system not only enables completeness of the data but also improves k k conditioned on the data, p(xk|z , u ). They do so via the accuracy of measuring the environment. Hall et. al. [92], [93] following recursive formula describe datafusion as The framework of this survey is as given in the figure: k k p(xk|z , u ) = ηk (6-B.1) 4. As part of this survey, we limit the sensors to Lidar and Z Camera. But any sensor like sonar, stereo, monocular, radar, p(zk|xk) p(xk|uk, xk−1) (6-B.2) lidar etc can be used. The initial step is raw data capture using the sensors. The data is filtered and it is used to detect k−1 k−1 p(xk−1|z , u ) dxk−1 (6-B.3) the objects in the environment. The next step is to classify the objects. The classification information is used to fuse the Broadly there are 3 steps involved in implementing a data to finalize information to feed into the control algorithm. particle filter [89], [90]. They are: The classification information could potentially give details of 1) Importance sampling: (i) pedestrians, furniture, vehicles, buildings etc. Data fusion at Sample the present trajectories xet and update this high level will enable tracking moving objects as well, as (i) (i) Normalize the weights wet = wt given in the research conducted by Garcia [94]. 2) Selection: (i) (i) 1) Raw Data sensing: For this survey, LiDAR and camera Samples (xt ) that have high importance weights wt e (i) e are used. LiDAR is the primary sensor due to its are multiplied and Samples (xt ) that have low impor- (i) e accuracy of detection and also the higher resolution of tance weights wet are suppressed data. A 2D Lidar has been studied and reviewed. RP 3) Markov chain Monte Carlo transition: Lidar [95] from slamtech was used in this study. It is Apply Markov transition kernel with an invariant distri- a planar scanner which is inexpensive and has provided (i) (i) bution that’s given by p(x0:t |y1:t) and obtain (x0:t) us acceptable results in detection of objects. Two Lidars, In comparison with standard approximation methods, such one in the front and one in the rear are used to detect as the popular Extended Kalman Filter, the principal advantage the front and the rear of the vehicle. The LiDAR is of particle methods is that they do not rely on any local effective in providing the shape of the objects in the linearization techniques or any crude functional approximation environment that could be hazardous obstacles to the

Page 10 obstacle or stop if the object is a destination or wait for a state to be reached for further action, if the object is deemed a marker or milestone. The control segment will take the necessary action, depending on the behavior as sensed by the sensor suite.

8. METHODOLOGY The methodology involves the data fusion of multi-modal data with one or many LiDAR and depth camera as the sources [84]. The type of datafusion would be based on input- output as described by Dasarathy et.al. [84]. They propose a classification strategy based on input - output of entities like data, architecture, features and decisions. We use fusion of data in the first layer, fusion of features in second, and finally the decision layer fusion. In case of the LiDAR and camera data fusion, we need two Fig. 4: Data fusion framework steps to effectively integrate/ fuse the data. This is because, we need to match the LiDAR datapoints onto the image data and for this to succeed, the resolutions need to match or be close to uniform. These two steps are: 1) Geometric alignment of the sensor data [97] 2) Resolution match between the sensor data [98] A. Geometric alignment of the sensor data The first and foremost step in the data fusion methodology is the alignment of the sensor data. In our framework the LiDAR data, the depth sensor data and the optical image data will be geometrically aligned to ensure the corresponding pixel info Fig. 5: High level perception task in the camera data for each of the depth camera and lidar data. B. Resolution match between the sensor data vehicle. The other sensor that was studied was a stereo Once the data is geometrically aligned, there must be a camera from Intel model Realsense D435 [96]. This is match in the resolution between the sensor data. The optical a stereo sensor that provides depth information as well. camera has the highest resolution of 1920 x 1080 at 30 fps, The benefit of using this combination is the accuracy, followed by the depth camera output which has a resolution of speed and resolution of the LiDAR and the quality and 1280 x 720 pixels at 90fps and finally the LiDAR data has the richness of data from the stereo camera. Together these lowest resolution. This step is an extrinsic calibration of the 2 sensors provide an accurate, rich and fast data set for data. We propose a calibration using geometric figures such the object detection layer. as circles and rectangles and segmentation of the same using 2) Object Detection: Object Detection is the method of reflective tape across the calibration board. Keeping in mind locating an object of interest in the sensor output. Lidar the depth aspect of a liDAR and the stereo camera, 3D depth data scans objects differently in its environment than a boards can be developed out of simple 2D images. Figure: camera. Hence the methodology to detect objects in the 6 shows the depth calibration board. The dimensions of the data from these sensors would be different as well. The board are : length 58” x width 18” x height 41.5”. For the research community has used this technique to detect Intel Realsense camera, we will need to perform a depthscale objects in aerial, ground, underwater environments. calibration. Figure: 7 shows the phone calibration tool [96]. 3) Object Classification: The Objects are detected and then An example of the phone dimensions is given in 8, in this they are classified into several types, so that they can be case an iPhone, that can display the calibration image given in grouped into small, medium and large objects, or hazard figure: 7 so that the user can calibrate the camera. In addition levels of non hazardous or hazardous, such that the right to an iPhone, this is available for phones or if the navigation can be handled for the appropriate object. user chooses non-phone medium, this is available for print on 4) Data Fusion: After the classification, the data is fused a letter size paper. Another addition to the calibration toolkit to finalize information as input to the control layer. The is the speck pattern board. These pattern boards (not to scale- data fusion layer output will provide location informa- figure: 9) give us better results since there is a higher spatial tion of the objects in the map of the environment, so frequency content and there is no laser speckle. It has been that the autonomous vehicle can for instance, avoid the witnessed that a passive target gives about (25 - 30%) [96].

Page 11 Fig. 9: Realsense iPhone speck pattern for calibration

The projector can be a drawback in some cases and it may help to turn off the projection from the camera and light up the subject using clean white light. We have also observed that Fig. 6: Depth calibration the RealSense cameras have better performance in open bright sunlight since there is better visibility of the natural textures. It should be noted that in case of the depth cameras, the stereo depth has a limitation due to the quality differences between the left and right images. We witnessed this phenomena in our smart phones as well; which shot better pictures in bright daylight than in dimly lit indoor lab environments, as shown in figures : There are several calibration techniques for the LiDAR and camera, wherein Mirzaei et. al [99] have provided techniques for intrinsic calibration of a Lidar and extrinsic calibration based on camera readings. Dong et. al [100] have provided a technique for extrinsic calibration of a 2D lidar and camera. Li et. al [101] also have developed a technique for 2D lidar and camera calibration; however for an indoor environment. Kaess et. al. [102] developed a novel technique to calibrate a 3D lidar and camera.

9. HARDWARE DESIGN The system we focus on ie.,’SmartWheels’ can be broadly classified into 2 areas namely: 1) Brain Computing Interface (BCI) Command Initiation 2) Wheelchair Navigation The navigation of the wheelchair comprises of an inte- grated mapping, localization, obstacle avoidance and tar- get/destination seeking system. As part of this survey, we will focus on discussing an obstacle avoidance subsystem as part of a safe navigation control system. The hardware for this research are: a smart wheelchair (power wheelchair Fig. 7: Realsense phone calibration tool Jazzy-Pride integrated with a computing system Nvidia Jetson TX2 GPU system, suite of sensors like proximity, 2D Lidar RPLIDAR A1 [95] which is a low cost 360 degree 2D laser scanner (LIDAR) solution developed by SLAMTEC. The system can perform 360 Degree scan within 6 Mts range. The produced 2D point cloud data can be used in mapping, localization and object/environment modeling. RPLIDAR A1s scanning frequency reached 5.5 hz when sampling 360 points Fig. 8: Realsense iPhone calibration screen dimensions each round. And it can be configured up to 10 hz maximum. RPLIDAR A1 is basically a laser triangulation measurement

Page 12 system. It can work excellent in all kinds of indoor envi- alignment process, in order to correlate the lidar with the stereo ronment and outdoor environment without sunlight., depth camera data. Finally, a second classification is performed using perception camera Realsense D435 [96] which uses stereo the features, to extract the decisions. vision to calculate depth. It is a USB-powered depth camera and consists of a pair of depth sensors, RGB sensor, and infrared projector. it is used in this system to add depth perception capability to the autonomous wheelchair.

10. DETECTION ALGORITHM After a successful data fusion, the information can be used to detect objects. There is a substantial list of detection algorithms that can very efficiently detect objects in the environment where the autonomous vehicle operates. As an example consider an autonomous wheelchair which operates in a known environment, ie., an environment has Fig. 10: Architecture of a fusion system been mapped and the vehicle needs to navigate to known destinations. If the environment will not change, the operator of the vehicle may just use the stored navigation routes and 11. RESULTS reach the destination from the source. For example, the living Our initial experiment was to utilize techniques of data- room to the kitchen. However, in a environment like a house fusion to detect obstacles in a mapped environment. Our obstacles like a chair might have been moved, a child could be preliminary results demonstrate that the system was able to playing in the living room, or an assistive dog may be lying detect obstacles with subsecond response times. The obstacles on the floor and resting. These could be termed as obstacles were placed at 1, 2, 5 and 10 meters. The maximum range that the vehicle needs to avoid, or it will end up harming of the RP LiDAR and Realsense D435 camera combination is the child, pet or operator. Hence, the need for the vehicle to about 10 meters. The fusion system was first executed on a operate efficiently. Nvidia Jetson TX1 and then on the Nvidia Jetson TX2. Both Keeping this scenario in mind, the wheelchair deals with Simon et. al and Qi et. al [111], [112] have demonstrated a two tier sensor data fusion. The first tier would be the detection using individual sensors. However, robustness is outerloop of the RPLidar that detects the distant objects, higher in our system, due to the combined datafusion using obstacles etc. The second tier will be the stereo camera both Lidar and stereo cameras, that are different types of Realsense D435 output [96], which will be used for immediate sensors as well. We have seen that calibration and data fusion object detection, recognition and avoidance as needed. There between a camera and a lidar allows for a color to cloudpoint are many classical methods for the detection of objects in an association. This fusion provides richer environment models image, such as dense image pyramids and classifier pyramids that are useful in navigation, mapping, obstacle detection and [103]. There are also many different feature detection methods other related tasks. such as fast feature pyramids that can quickly calculate places in the image where there could potentially be a person [103]. 12. CONCLUSION The Speed is around 30 Frames per second. In addition, we As part of this paper, we have performed an exhaustive reviewed R-CNN and their variants, including the original R- review of the available data fusion techniques, that can be CNN, Fast R-CNN [104], and Faster R-CNN [105], Single used in intelligent mobility systems. We researched the multi- Shot Detector (SSDs) [106] and a Fast version of You Only disciplinary nature of data fusion and found out why multiple Look Once(YOLO-Fast) [107]–[109]. sensors are better than one when used for robot navigation. A We choose to use YOLO due to its small profile and speed discussion of the concepts of robot perception is provided, in of around 155 Frames per second, which is essential for the addition to discussing some of the previous work that has per- wheelchair detection. Object detection is performed in this formed seminal work in this area. Several autonomous systems publication, to avoid obstacles if there are in the navigation are dependent on the techniques of data fusion for successful path of the wheelchair. task implementation; the tasks being: Navigation tasks like Figure:10, gives a high level methodology of the au- localization, mapping, obstacle avoidance, object recognition, tonomous system. The raw signal is sensed and processed. emergency vehicle navigation modules etc. We have seen from Using classification techniques of YOLO, an initial classi- novel research publications as to how data fusion will drive the fication was performed. The response time was comparable future of autonomous systems and extend algorithms into areas with the KITTI benchmark [110] results. Our initial results are of commercial autonomous systems, in addition to military comparable with Qi et.al [111] results on 3d object detection systems. Filtering techniques such as Kalman filters, particle on rgb-d data and Complex-YOLO, a flavor of fast YOLO by filters and similar techniques are discussed. A comparison of Simon et.al [112]. This first level of classification is performed the different types of data-fusion, documenting the advantages on the Data and features are extracted. Its fed through an and disadvantages is given as well. Some sensors of interest

Page 13 like the Intelrealsense, RPlidar and others were researched [17] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlo and their performance and capabilities are mentioned. Some localization for mobile robots,” Artificial intelligence, vol. 128, no. 1-2, pp. 99–141, 2001. calibration techniques suggested by the vendors are discussed. [18] S. Thrun et al., “Robotic mapping: A survey,” Exploring artificial We discuss an architecture that uses multi-sensor data-fusion to intelligence in the new millennium, vol. 1, no. 1-35, p. 1, 2002. detect, process and control a robot. This survey also describes [19] S. Thrun and A. Bucken,¨ “Integrating grid-based and topological maps for navigation,” in Proceedings of the National the benefits of using data fusion in mapping, localization and Conference on Artificial Intelligence, 1996, pp. 944–951. obstacle detection and avoidance. Finally we discuss some [20] S. Thrun, “Robotic mapping: A survey. cmu-cs-02111,” in Robotic initial results of our work in datafusion using the realsense Mapping a Survey, 2002. [21] J. Borenstein and Y. Koren, “The vector field histogram-fast obstacle and RPlidar. avoidance for mobile robots,” IEEE transactions on robotics and automation, vol. 7, no. 3, pp. 278–288, 1991. ACKNOWLEDGMENTS [22] D. Fox, W. Burgard, and S. Thrun, “The dynamic window approach to collision avoidance,” IEEE Robotics & Automation Magazine, vol. 4, The authors would like to thank everyone in the Au- no. 1, pp. 23–33, 1997. [23] A. Elfes, “Using occupancy grids for mobile robot perception and tonomous Control Engineering Lab, Electrical Engineering navigation,” Computer, no. 6, pp. 46–57, 1989. Department, the University of Texas at San Antonio, for their [24] R. G. Danescu, “Obstacle detection using dynamic particle-based support. occupancy grids,” in 2011 International Conference on Digital Image Computing: Techniques and Applications. IEEE, 2011, pp. 585–590. [25] J.-H. Cho, D.-S. Pae, M.-T. Lim, and T.-K. Kang, “A real-time REFERENCES obstacle avoidance method for autonomous vehicles using an obstacle- dependent gaussian potential field,” Journal of Advanced Transporta- [1] L. Fehr, W. E. Langbein, and S. B. Skaar, “Adequacy of power tion, vol. 2018, 2018. wheelchair control interfaces for persons with severe disabilities: A [26] R. H. Rasshofer, M. Spies, and H. Spies, “Influences of weather clinical survey,” Journal of rehabilitation research and development, phenomena on automotive laser radar systems,” Advances in Radio vol. 37, no. 3, pp. 353–360, 2000. Science, vol. 9, no. B. 2, pp. 49–60, 2011. [2] R. C. Simpson, “Smart wheelchairs: A literature review,” Journal of [27] M. Kyto,¨ M. Nuutinen, and P. Oittinen, “Method for measuring rehabilitation research and development, vol. 42, no. 4, p. 423, 2005. stereo camera depth accuracy based on stereoscopic vision,” in Three- [3] M. M. Martins, C. P. Santos, A. Frizera-Neto, and R. Ceres, “Assistive Dimensional Imaging, Interaction, and Measurement, vol. 7864. In- mobility devices focusing on smart walkers: Classification and review,” ternational Society for Optics and Photonics, 2011, p. 78640I. Robotics and Autonomous Systems, vol. 60, no. 4, pp. 548–562, 2012. [28] T. Duong Pham, R. Shrestha, J. Virkutyte, and M. Sillanp, “Recent [4] T. H. Noonan, J. Fisher, and B. Bryant, “Autonomous lawn mower,” studies in environmental applications of ultrasound,” Canadian Journal Apr. 20 1993, uS Patent 5,204,814. of Civil Engineering, vol. 36, pp. 1849–1858, 11 2009. [5] F. Bernini, “Autonomous lawn mower with recharge base,” Feb. 23 [29] R. C. Luo, C.-C. Yih, and K. L. Su, “Multisensor fusion and integration: 2010, uS Patent 7,668,631. approaches, applications, and future research directions,” IEEE Sensors [6] I. Ulrich, F. Mondada, and J. Nicoud, “Autonomous vacuum cleaner,” journal, vol. 2, no. 2, pp. 107–119, 2002. Robotics and Autonomous Systems, vol. 19, 03 1997. [30] D. Lahat, T. Adali, and C. Jutten, “Multimodal data fusion: An [7] G. Mutiara, G. Hapsari, and R. Rijalul, “Smart guide extension for overview of methods, challenges, and prospects,” Proceedings of the blind cane,” pp. 1–6, 05 2016. IEEE, vol. 103, no. 9, pp. 1449–1477, Sept 2015. [8] B. Leibe, E. Seemann, and B. Schiele, “Pedestrian detection in crowded [31] B.-S. Choi and J.-J. Lee, “Sensor network based localization algorithm scenes,” in Computer Vision and Pattern Recognition, 2005. CVPR using fusion sensor-agent for indoor ,” IEEE Transactions 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. on Consumer Electronics, vol. 56, no. 3, 2010. 878–885. [32] B.-K. Dan, Y.-S. Kim, J.-Y. Jung, S.-J. Ko et al., “Robust people count- [9] T.-D. Vu, “Vehicle perception: Localization, mapping with detection, ing system based on sensor fusion,” IEEE transactions on consumer classification and tracking of moving objects,” Ph.D. dissertation, electronics, vol. 58, no. 3, pp. 1013–1021, 2012. Institut National Polytechnique de Grenoble-INPG, 2009. [33] M. Caputo, K. Denker, B. Dums, G. Umlauf, H. Konstanz, and G. , [10] T.-D. Vu and O. Aycard, “Laser-based detection and tracking moving “3d hand gesture recognition based on sensor fusion of commodity objects using data-driven markov chain monte carlo,” in Robotics and hardware,” Mensch Comput., vol. 2012, 01 2012. Automation, 2009. ICRA’09. IEEE International Conference on, 2009, [34] A. Pacha, “Sensor fusion for robust outdoor augmented reality tracking pp. 3800–3806. on mobile devices,” 2013. [11] R. Nobrega, J. Quintanilha, and C. O’Hara, “A noise-removal approach [35] V. De Silva, J. Roche, and A. Kondoz, “Fusion of lidar and camera for lidar intensity images using anisotropic diffusion filtering to pre- sensor data for environment sensing in driverless vehicles,” 2018. serve object shape characteristics,” American Society for Photogramme- [36] V. John, Q. Long, Y. Xu, Z. Liu, and S. Mita, “Sensor fusion and try and Remote Sensing - ASPRS Annual Conference 2007: Identifying registration of lidar and stereo camera without calibration objects,” Geospatial Solutions, vol. 2, pp. 471–481, 01 2007. IEICE TRANSACTIONS on Fundamentals of Electronics, Communica- [12] N. Cao, C. Zhu, Y. Kai, and P. Yan, “A method of background noise tions and Computer Sciences, vol. 100, no. 2, pp. 499–509, 2017. reduction in lidar data,” Applied Physics B, vol. 113, 05 2013. [37] D. Huber, T. Kanade et al., “Integrating lidar into stereo for fast and [13] W. Burgard, D. Fox, D. Hennig, and T. Schmidt, “Estimating the improved disparity computation,” in 3D Imaging, Modeling, Process- absolute position of a mobile robot using position probability grids,” in ing, Visualization and Transmission (3DIMPVT), 2011 International Proceedings of the national conference on artificial intelligence, 1996, Conference on. IEEE, 2011, pp. 405–412. pp. 896–901. [38] D. J. Balkcom and M. T. Mason, “Time optimal trajectories for [14] J.-S. Gutmann and C. Schlegel, “Amos: Comparison of scan matching bounded velocity differential drive vehicles,” The International Journal approaches for self-localization in indoor environments,” in Proceed- of Robotics Research, vol. 21, no. 3, pp. 199–217, 2002. ings of the First Euromicro Workshop on Advanced Mobile Robots [39] I. Kolmanovsky and N. H. McClamroch, “Developments in nonholo- (EUROBOT’96). IEEE, 1996, pp. 61–67. nomic control problems,” IEEE Control systems magazine, vol. 15, [15] Z. Zhang, R. Deriche, O. Faugeras, and Q.-T. Luong, “A robust no. 6, pp. 20–36, 1995. technique for matching two uncalibrated images through the recovery [40] J.-A. Fernandez-Madrigal,´ Simultaneous Localization and Mapping for of the unknown epipolar geometry,” Artificial intelligence, vol. 78, no. Mobile Robots: Introduction and Methods: Introduction and Methods. 1-2, pp. 87–119, 1995. IGI Global, 2012. [16] F. Lu and E. Milios, “Robot pose estimation in unknown environments [41] J. Borenstein and Y. Koren, “Obstacle avoidance with ultrasonic by matching 2d range scans,” Journal of Intelligent and Robotic sensors,” IEEE Journal on Robotics and Automation, vol. 4, no. 2, systems, vol. 18, no. 3, pp. 249–275, 1997. pp. 213–218, April 1988.

Page 14 [42] J. J. Leonard, H. F. Durrant-Whyte, and I. J. Cox, “Dynamic map [64] K. Wan, L. Ma, and X. Tan, “An improvement algorithm on ransac building for an autonomous mobile robot,” The International Journal for image-based indoor localization,” in Wireless Communications and of Robotics Research, vol. 11, no. 4, pp. 286–298, 1992. Mobile Computing Conference (IWCMC), 2016 International. IEEE, [43] P. Mirowski, M. Grimes, M. Malinowski, K. M. Hermann, K. An- 2016, pp. 842–845. derson, D. Teplyashin, K. Simonyan, A. Zisserman, R. Hadsell et al., [65] J. Biswas and M. Veloso, “Depth camera based indoor mobile robot “Learning to navigate in cities without a map,” in Advances in Neural localization and navigation,” in Robotics and Automation (ICRA), 2012 Information Processing Systems, 2018, pp. 2419–2430. IEEE International Conference on. IEEE, 2012, pp. 1697–1702. [44] A. Pritsker, “Introduction to stimulation and slam ii. third edition,” 1 [66] “Htc vive.” 1986. [67] C.-C. Wang, C. Thorpe, S. Thrun, M. Hebert, and H. Durrant-Whyte, [45] M. W. M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant- “Simultaneous localization, mapping and moving object tracking,” The Whyte, and M. Csorba, “A solution to the simultaneous localization International Journal of Robotics Research, vol. 26, no. 9, pp. 889–916, and map building (slam) problem,” IEEE Transactions on Robotics 2007. and Automation, vol. 17, no. 3, pp. 229–241, June 2001. [68] H. Baltzakis, A. Argyros, and P. Trahanias, “Fusion of laser and visual [46] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “Monoslam: data for robot motion planning and collision avoidance,” Machine Real-time single camera slam,” IEEE Transactions on Pattern Analysis Vision and Applications, vol. 15, no. 2, pp. 92–100, 2003. & Machine Intelligence, no. 6, pp. 1052–1067, 2007. [69] H. P. Moravec, “Sensor fusion in certainty grids for mobile robots,” AI [47] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A magazine, vol. 9, no. 2, p. 61, 1988. benchmark for the evaluation of rgb-d slam systems,” in 2012 IEEE/RSJ [70] A. Elfes, “Occupancy grids: A probabilistic framework for robot International Conference on Intelligent Robots and Systems. IEEE, perception and navigation,” Ph. D Thesis, Carnegie-Mellon University, 2012, pp. 573–580. 1989. [48] “List of slam methods.” [71] S. P. Engelson and D. V. McDermott, “Error correction in mobile robot [49] S. Huang and G. Dissanayake, “Robot localization: An introduction,” map learning,” in Robotics and Automation, 1992. Proceedings., 1992 Wiley Encyclopedia of Electrical and Electronics Engineering, pp. 1– IEEE International Conference on. IEEE, 1992, pp. 2555–2560. 10, 1999. [72] D. Kortenkamp and T. Weymouth, “Topological mapping for mobile [50] ——, “Convergence and consistency analysis for extended kalman filter robots using a combination of sonar and vision sensing,” in AAAI, based slam,” IEEE Transactions on robotics, vol. 23, no. 5, pp. 1036– vol. 94, 1994, pp. 979–984. 1049, 2007. [73] B. Kuipers and Y.-T. Byun, “A robot exploration and mapping strategy [51] H. Liu, H. Darabi, P. Banerjee, and J. Liu, “Survey of wireless indoor based on a semantic hierarchy of spatial representations,” Robotics and positioning techniques and systems,” IEEE Transactions on Systems, autonomous systems, vol. 8, no. 1-2, pp. 47–63, 1991. Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, [74] S. Thrun, A. Buecken, W. Burgard, D. Fox, A. B. Wolfram, B. D. no. 6, pp. 1067–1080, 2007. Fox, T. Frohlinghaus,¨ D. Hennig, T. Hofmann, M. Krell et al., “Map [52] S. Kwon, K. Yang, and S. Park, “An effective kalman filter localization learning and high-speed navigation in rhino,” 1996. method for mobile robots,” in Intelligent Robots and Systems, 2006 [75] “Senses and abilities-north american bear center.” IEEE/RSJ International Conference on. IEEE, 2006, pp. 1524–1529. [76] E. Waltz, J. Llinas et al., Multisensor data fusion. Artech house [53] L. Ojeda and J. Borenstein, “Personal dead-reckoning system for gps- Boston, 1990, vol. 685. denied environments,” in Safety, Security and Rescue Robotics, 2007. [77] D. L. Hall and S. A. McMullen, “Mathematical techniques in multi- SSRR 2007. IEEE International Workshop on. IEEE, 2007, pp. 1–6. sensor data fusion, artech house,” Inc., Norwood, MA, vol. 57, 1992. [54] R. W. Levi and T. Judd, “Dead reckoning navigational system using [78] “Information fusion defin ition.” accelerometer to measure foot impacts,” Dec. 10 1996, uS Patent [79] W. Elmenreich, “An introduction to sensor fusion,” Vienna University 5,583,776. of Technology, Austria, vol. 502, 2002. [55] E. Elnahrawy, X. Li, and R. P. Martin, “The limits of localization [80] E. Bosse, J. Roy, and D. Grenier, “Data fusion concepts applied using signal strength: A comparative study,” in 2004 First Annual to a suite of dissimilar sensors,” in Proceedings of 1996 Canadian IEEE Communications Society Conference on Sensor and Ad Hoc Conference on Electrical and Computer Engineering, vol. 2. IEEE, Communications and Networks, 2004. IEEE SECON 2004. IEEE, 1996, pp. 692–695. 2004, pp. 406–414. [81] P. Grossmann, “Multisensor data fusion,” GEC Journal of Technology, [56] A. Neves, H. C. Fonseca, and C. G. Ralha, “Location agent: a study vol. 15, no. 1, pp. 27–37, 1998. using different wireless protocols for indoor localization,” International [82] N. S. Rao, “A fusion method that performs better than best sensor,” Journal of Wireless Communications and Mobile Computing, vol. 1, pp. Oak Ridge National Lab., TN (United States), Tech. Rep., 1998. 1–6, 2013. [83] F. Castanedo, “A review of data fusion techniques,” The Scientific World [57] K. Whitehouse, C. Karlof, and D. Culler, “A practical evaluation of Journal, vol. 2013, 2013. radio signal strength for ranging-based localization,” ACM SIGMOBILE [84] B. V. Dasarathy, “Sensor fusion potential exploitation-innovative archi- Mobile Computing and Communications Review, vol. 11, no. 1, pp. 41– tectures and illustrative applications,” Proceedings of the IEEE, vol. 85, 52, 2007. no. 1, pp. 24–38, 1997. [58] S. He and S.-H. G. Chan, “Wi-fi fingerprint-based indoor positioning: [85] A. N. Steinberg and C. L. Bowman, “Revisions to the jdl data fusion Recent advances and comparisons,” IEEE Communications Surveys & model,” in Handbook of multisensor data fusion. CRC press, 2008, Tutorials, vol. 18, no. 1, pp. 466–490, 2016. pp. 65–88. [59] Y. Wang, Q. Ye, J. Cheng, and L. Wang, “Rssi-based bluetooth indoor [86] F. E. White, “Data fusion lexicon,” Joint Directors of Labs Washington localization,” in 2015 11th International Conference on Mobile Ad-hoc DC, Tech. Rep., 1991. and Sensor Networks (MSN). IEEE, 2015, pp. 165–171. [87] R. E. Kalman et al., “Contributions to the theory of optimal control,” [60] A. Krizhevsky. Cudaconvnet. [Online]. Available: Bol. Soc. Mat. Mexicana, vol. 5, no. 2, pp. 102–119, 1960. https://www.space.com/19794-navstar.html [88] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to [61] ——. Cudaconvnet. [Online]. Available: nonlinear/non-gaussian bayesian state estimation,” in IEE Proceedings https://www.asirobots.com/platforms/mobius/ F-radar and signal processing, vol. 140, no. 2. IET, 1993, pp. 107– [62] C. Ramer, J. Sessner, M. Scholz, X. Zhang, and J. Franke, “Fusing 113. low-cost sensor data for localization and mapping of automated guided [89] M. F. Bugallo, S. Xu, and P. M. Djuric,´ “Performance comparison vehicle fleets in indoor applications,” in Multisensor Fusion and of ekf and particle filtering methods for maneuvering targets,” Digital Integration for Intelligent Systems (MFI), 2015 IEEE International Signal Processing, vol. 17, no. 4, pp. 774–786, 2007. Conference on. IEEE, 2015, pp. 65–70. [90] R. Van Der Merwe, A. Doucet, N. De Freitas, and E. A. Wan, “The [63] D. Fontanelli, L. Ricciato, and S. Soatto, “A fast ransac-based regis- unscented particle filter,” in Advances in neural information processing tration algorithm for accurate localization in unknown environments systems, 2001, pp. 584–590. using lidar measurements,” in Automation Science and Engineering, [91] J. Carpenter, P. Clifford, and P. Fearnhead, “Improved particle filter for 2007. CASE 2007. IEEE International Conference on. IEEE, 2007, nonlinear problems,” IEE Proceedings-Radar, Sonar and Navigation, pp. 597–602. vol. 146, no. 1, pp. 2–7, 1999.

Page 15 [92] D. L. Hall and J. Llinas, “An introduction to multisensor data fusion,” Proceedings of the IEEE, vol. 85, no. 1, pp. 6–23, 1997. [93] D. L. Hall and R. J. Linn, “A taxonomy of algorithms for multisensor data fusion,” in Proc. 1990 Joint Service Data Fusion Symp, 1990. [94] R. O. Chavez-Garcia, “Multiple sensor fusion for detection, classifica- tion and tracking of moving objects in driving environments,” Ph.D. dissertation, Universite´ de Grenoble, 2014. [95] R. L. A1. Rp lidar a1 details. [Online]. Available: http://www.ksat.com/news/alarming-40-percent-increase-in- pedestrian-deaths-in-2016-in-san-antonio [96] ——. Rp lidar a1 details. [Online]. Available: https://click.intel.com/intelr-realsensetm-depth-camera-d435.html [97] D. John Campbell, “Robust and optimal methods for geometric sensor data alignment,” Ph.D. dissertation, 01 2018. [98] V. De Silva, J. Roche, and A. Kondoz, “Robust fusion of lidar and wide- angle camera data for autonomous mobile robots,” Sensors, vol. 18, no. 8, p. 2730, 2018. [99] F. M. Mirzaei, D. G. Kottas, and S. I. Roumeliotis, “3d lidar–camera intrinsic and extrinsic calibration: Identifiability and analytical least- squares-based initialization,” The International Journal of Robotics Research, vol. 31, no. 4, pp. 452–467, 2012. [100] W. Dong and V. Isler, “A novel method for the extrinsic calibration of a 2d laser rangefinder and a camera,” IEEE Sensors Journal, vol. 18, no. 10, pp. 4200–4211, 2018. [101] J. Li, X. He, and J. Li, “2d lidar and camera fusion in 3d modeling of indoor environment,” in 2015 National Aerospace and Electronics Conference (NAECON). IEEE, 2015, pp. 379–383. [102] L. Zhou, Z. Li, and M. Kaess, “Automatic extrinsic calibration of a camera and a 3d lidar using line and plane correspondences,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems, IROS, October 2018. [103] P. Dollar, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1532–1545, Aug. 2014. [104] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. [105] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real- time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99. [106] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37. [107] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788. [108] ——, “You only look once: Unified, real-time object detection,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016. [109] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” arXiv preprint, 2017. [110] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 3354– 3361. [111] C. R. Qi, W. Liu, C. Wu, H. Su, and L. J. Guibas, “Frustum pointnets for 3d object detection from rgb-d data,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 918–927. [112] M. Simony, S. Milzy, K. Amendey, and H.-M. Gross, “Complex-yolo: an euler-region-proposal for real-time 3d object detection on point clouds,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 0–0.

Page 16