2.2 Background Subtraction 14 2.3 Combined Effect 16 2.4 Additional Considerations 17
Total Page:16
File Type:pdf, Size:1020Kb
DEPARTMENT OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE MASTER´S THESIS CONTROL AND PROGRAMMING OF AN INDUSTRIAL ROBOT USING MICROSOFT KINECT BY: PABLO PÉREZ TEJEDA MAT. NO.: PROFESSOR: Prof. Dr. Ing. Konrad Wöllhaf SUBMISSION DATE: 31. August 2013 Acknowledgement I would like to include in this short acknowledgement a gratitude to my Professor, Dr. Wöllhaf, for giving me hints in the analysis of the problem and providing my facilities for finishing my thesis; also mention Prof. Matthias Stark, for his help with the hardware, and finally, to my family and friends, for giving me support and understanding in the long days spent in front of three computer screens. ii Abstract Industrial robot programming relies on a suitable interface between a human operator and the robot hardware. This interface has evolved through the years to facilitate the task of programming, making possible, for example, to position a robot at real time with a handheld unit, or designing ‘offline’ the layout and operation of a complex industrial process in the GUI of a computer application. The different approaches to robot programming aim to increasing rates of efficiency without impairing already assumed capacities of the programming environment, like stability control, precision or safety. Some of these approaches have found their way in computer vision, and the last generation of image sensors boosts today many applications, inside and outside the automation industry, featuring extended capabilities in a new low-cost market. The Kinect™ sensor from Microsoft® emerged in the market of video game consoles in 2010, based around a webcam-style add-on peripheral for the Xbox 360™ console, enabling gamers to control and interact with it through a natural user interface using gestures and spoken commands. Nevertheless, the project was aimed at broadening the Xbox 360’s audience beyond its typical gamer base and in 2011 Microsoft released the Kinect SDK for Windows® 7, allowing developers to write Kinect applications in C++/CLI, C# and Visual Basic .NET™. The aim of this thesis is the design, implementation, testing and documentation of the software capable of identifying, tracking, locating and representing three objects in real time, based on their color characteristic, through the use of the Kinect sensor. Such a system is also the beginning of an interface for robot programming. The sensor is to be programmed in C++ language using the Kinect for Windows SDK and the Desktop App UI for Windows. The OpenCV library is the tool for the image processing algorithms. For this thesis, the IDE selected for programming is Visual Studio 2012, running in a 32-bit OS1. 1 Other details of the hardware are the microprocessor, AMD Phenom™ II X4 955 at 3.20 GHz, and the graphic accelerator, NVIDIA GeForce GTS 450. The OS version is Windows 8 Premium iii Contents 1 Introduction 6 1.1 Motivation. The Use of the Natural User Interface 7 1.2 Hardware and Software Tools 8 1.3 Approach to the Problem 9 2 Segmentation Algorithm 10 2.1 Color Segmentation 11 2.2 Background Subtraction 14 2.3 Combined Effect 16 2.4 Additional Considerations 17 3 Proposed Extensions on Image Processing 20 3.1 Labeling 20 3.2 Shape recognition 21 3.3 Video Tracking 21 4 Design and Interface for the Solution 22 4.1 System Schematic 22 4.2 Application’s Flow Diagram 22 4.3 Application’s Performance 24 5 Application’s User Guide 26 5.1 Algorithms Bitmap 27 5.2 Object Tracking Bitmap 31 Appendix A I Bibliography II iv Introduction The use of the term “interface” is recurrently used throughout the whole document, and therefore, it is appropriate to include a short explanation of this term. Interface can adopt different definitions depending on the context to which it is applied: In computer science… “An interface is the point of interaction between a computer and any other entity, such as a printer or human operator.” More specifically, in software development… “A software interface is the layout of an application’s graphic or textual controls in conjunction with the way the application responds to user activity.” In object-oriented programming… “An interface is an abstract type that contains no data, but exposes behaviors defined as methods.” A general definition… “An interface is a surface forming a common boundary between bodies or regions.” Attending to the first definition, an interface is required to connect the computer to the data captured by the Kinect. This connection is assessed by the Kinect Natural User Interface and the Kinect drivers and hardware. The Kinect Natural User Interface is a part of the Kinect for Windows API focused on the acquisition of data streams from the Kinect. Attending to the second definition, another interface is needed to permit the interaction between a computer application and its user. This connection must be designed using the Windows Desktop API. The Windows Desktop API is a set of software implementations which operates directly on the computer hardware, permitting us, for example, to display images on a screen device or process mouse and keyboard events from the user. 6 To address the objectives of this thesis, we need to develop the second interface of the two preceding and a processing routine. With these elements, the system must be able to retrieve an input, three objects to be recognized, screen this input to obtain useful data, and show the appropriate results at an output2. Interfaces and routine are embedded into a computer application, and run into a single computer process. The processing routine uses C++ implementations from both of the previous APIs, from the standard C++ libraries and from the OpenCV libraries. The whole application is compiled in C++ language and executed into a Win32 app. The following schematic helps making clear the structure of the system (the elements to be developed in this thesis are marked in red). Application Interface Natural Routine Kinect Kinect User Operator API Interface 1.1 Motivation. The Use of the Natural User Interface A Natural User Interface (NUI), as opposed to a Graphical User Interface (GUI) or a Command-Line Interface (CLI) is based on actions that come naturally to humans, like basic body motion or gesturing. It effectively becomes invisible for its users (as the details of its implementation remain hidden), improves performance when working with an application and permits successive learned interactions that make the process of learning more intuitive and easily assimilated at each step. NUIs deploy interaction design concepts, such as “shape digital things for people’s use” and rely on a user being able to quickly transition from novice to expert3. A Kinetic User Interface (KUI) allows users to interact with computing devices through the motion of objects or the human body. The Kinect is an example of a KUI. The development of this interface technology relies on the ability of interaction devices to detect an increasingly wide range of human actions and gestures. 2 The output can be any suitable representation of the position of the objects. For this thesis, two representations are provided; one, showing the position of the objects as perceived by the camera; the other, simulating a robotic arm which takes the positional data of the objects as kinematic parameters. The second representation is intended to demonstrate the behaviour of an hypothetical robot connected to the system 3 One example of a strategy for designing a NUI is the strict limiting of functionality and customization so that users have very little to learn in the operation of a device. Provided that the default capabilities match the user’s goals, the interface is effortless to use. Other approaches use “reality-based interfaces” (RBI) methods, whose main efforts point to a merge between cyberspace and real world 7 The Kinect NUI gives the programmer the possibility to deploy solutions that permit users the natural interaction with computers by simply gesturing and speaking. The control of the motion of a robot through one of these interfaces is known as vision-based robot control, or simply visual servoing (abbreviated VS). VS is a technique which uses feedback information extracted from a vision sensor, like the Kinect, to control the motion of a robot. The first works in these fields were published in 1996, and today a complete methodology that treats this technology exists. 1.2 Hardware and Software Tools Kinect sensor and the Kinect for Windows API The Kinect has hardware components (including the Kinect sensor and the USB hub through which it is connected to the computer), drivers to connect these hardware components (installed as part of the SDK for Windows OS) and audio and video components and APIs. It features an infrared emitter and depth sensor, a RGB camera, a microphone array and an accelerometer, as well as a software pipeline that processes color, depth and skeleton data up to 30 FPS. Some of these data can be retrieved from the sensor in the form of media streams; for example, in one of those streams we find data about the color camera, other provides depth data from the depth sensor, audio data from the microphone, skeleton data, etc. For the purposes of this thesis we will only need the two first streams: the color stream and the depth stream. The Kinect for windows API delivers these streams with minimal software latency across various video, CPU, and device variables. Besides the Kinect for Windows API, the Kinect also makes use of implementations of media APIs available to desktop applications in Windows 7 and Windows 8. Image Processing and the OpenCV libraries The application developed for this thesis uses concepts, algorithms and implementations from image processing and computer vision.