The effect of input modalities, different levels of task complexity and embodiment on users’ overall performance and perception in human-robot collaboration tasks.
Master’s Thesis to confer the academic degree of Dipl.-Ing. from the Faculty of Natural Sciences at the Paris-Lodron-University of Salzburg
by Gerald Stollnberger, BEng.
Supervisor of the thesis: Tscheligi, Manfred, Univ.Prof., Dr.
Advisor of the thesis: Weiss, Astrid, Dr.
Department of Computer Science
Salzburg, February 21, 2014 Acknowledgment
At this point I want to thank everyone who supported me in writing this thesis. First of all I want to thank my professor and supervisor Univ.-Prof. Dr. Manfred Tscheligi. Without him, this thesis would not have been written. Furthermore, I want to especially thank Dr. Astrid Weiss, my advisor for her encour- agement, helpfulness, and motivation. She spent much time on this project, and also evoked me on this topic and robotics in general. I want to thank my supervisors for giving me the chance to connect hobby (to build and program robots) with work, to have a job in the research, and to get to know the robotics community. My thanks also go to my colleagues from the ICT&S Center, who always helped me with words and deeds for writing this thesis. I am also much obliged to the people participating in my studies. Gratitude also to my mother and father and my sister for their comprehensive support and patience. Furthermore, to my wife and daughter for their love and understanding in this also sometimes exhausting and stressful process. Last but not least, I want to thank my friends and my university colleagues, who ac- companied me during my academic studies. My heartfelt thanks go to all of you. I really appreciate your contributions. Thanks a lot!
I Declaration
I hereby declare and confirm that this thesis is entirely the result of my own original work. Where other sources of information have been used, they have been indicated as such and properly acknowledged.
I further declare that this or similar work has not been submitted for credit elsewhere.
Salzburg, February 21, 2014 Signature
II Abstract
Human-Robot Interaction (HRI) is a growing multidisciplinary field of science. The focus of HRI is to study the interaction between robots and humans with a contribution of social science, computer science, robotics, artificial intelligence, and many more.
Nowadays one of the research goals of HRI is to enable robots and humans to work from shoulder to shoulder. In order to avoid misunderstandings or dangerous situations, choosing the right input modality, especially for different levels of task complexity, is a crucial aspect for successful cooperation. It can be assumed, that for specific levels of task complexity, there is always one complementing input modality which increases the corresponding user satisfaction and performance.
In order to identify the most appropriate input modality in relation to the level of task complexity, two user studies are presented in this thesis, as well as the complete prototyping process for the robot.
The first study was in a public space where participants could choose between two different input modalities (a PC-remote control and a gesture-based interface) to drive a race with a Lego Mindstorm robot against the other participant. The second study was conducted within a controlled laboratory environment where participants had to solve three assembly tasks, which differed in complexity. In order to get the required parts, they had to use three different input modalities (the PC-remote and a gesture-based interface from the first study plus a speech control interface). Besides investigating the correlation of task complexity and input modality, it was explored in this second study, whether the appearance of the robot also has an impact on how the collaboration is perceived by the user.
One of the main findings was that all of these factors (input modality, level of task complexity as well as the appearance of the robot) had a severe impact on the results. In addition to that, many interdependencies could be identified, especially between input modality and task complexity. Differences in user satisfaction measures and also in performance were often highly significant, especially for hard tasks. Furthermore, it was found, that the perceived task complexity was strongly dependent on users’ cognitive workload, driven by the used input modality, which also emphasized the strong coherency of task complexity and input modality. Regarding the influence of the appearance of the robot, it could be shown, that the human-like shape increased users’ self confidence, to be able to solve a task together with the robot without any help of someone else.
III Contents
1. Outline 3
2. Introduction 5 2.1. Main Research Questions ...... 10 2.2. Research Sub-Questions ...... 10
3. Related Work 12
4. Prototype 21 4.1. The Purely Functional Robot ...... 21 4.2. The Anthropomorphic Robot ...... 24 4.3. The Input Modalities ...... 24 4.3.1. The Mobile Gesture-based Interface ...... 25 4.3.2. The Speech Control ...... 27 4.3.3. The PC-remote Control ...... 31
5. User Studies 33 5.1. Preliminary Study in a Public Context ...... 33 5.1.1. Methodology ...... 34 5.1.2. Measures ...... 35 5.1.2.1. Performance Measures (Behavioral Level) ...... 36 5.1.2.2. User Satisfaction Measures (Attitudinal Level) ...... 36 5.1.3. Results ...... 37 5.1.3.1. Performance Measures (Behavioral Level) ...... 38 5.1.3.2. User Satisfaction Measures (Attitudinal Level) ...... 39 5.1.4. Summary of the Preliminary Study ...... 41 5.2. Controlled Laboratory Study ...... 43 5.2.1. Methodology ...... 43 5.2.2. Measures ...... 48 5.2.2.1. Performance Measures (Behavioral Level) ...... 48 5.2.2.2. User Satisfaction Measures (Attitudinal Level) ...... 48 5.2.3. Results ...... 51 5.2.3.1. Performance Measures (Behavioral Level) ...... 51 5.2.3.2. User Satisfaction Measures (Attitudinal Level) ...... 53 5.2.4. Summary of the Laboratory Study ...... 60
6. Discussion 61
7. Future Work 65
1 A. Appendix A.1. Workshop Position Paper accepted for the 2012 IEEE International Sympo- sium on Robot and Human Interactive Communication (RO-MAN) ...... A.2. Late-breaking Report accepted for ACM/IEEE International Conference on Human-Robot Interaction 2013 ...... A.3. Data Consent Form for the Preliminary Study ...... A.4. Questionnaire used in the Preliminary Study ...... A.5. Output of the Data Analysis of the Preliminary Study ...... A.6. Data Consent Form for the Laboratory Study ...... A.7. Studycycle ...... A.8. Questionnaire concerning the User Satisfaction Measures used in the Labora- tory Study ...... A.9. The NASA Raw Task Load Index used in the Laboratory Study ...... A.10.The Goodspeed Questionnaire about Anthropomorphism in German . . . . . A.11.Full Paper accepted for the 2013 IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) ...... A.12.Output of the Data Analysis of the Laboratory Study ......
2 1. Outline
The goal of this work is to investigate whether there exists an interdependency between input modalities and task complexities, and its effect on performance and user satisfaction in human robot interaction (especially in a cooperative scenario). Therefore, three different input modalities were implemented to control a robot prototype, which was realized by using the Lego Mindstorms NXT 2.0 toolkit.
The input modalities used in the user studies were:
• A traditional PC-remote control using the keyboard for commanding the robot,
• a mobile gesture-based interface on an Android mobile which depends on the accelerom- eter sensor of the mobile,
• and a speech control system using directly the voice of the user.
In order to evaluate the approach, two user studies were conducted involving behavioral and attitudinal measures.
This thesis is structured as follows:
• In the first chapter a short overview about the history of robotics is given, followed by the research objectives and research questions.
• The second chapter depicts the related work, especially regarding the state of the art of input modalities in the field of HRI, as well as the motivation for the used input modalities in this work.
• The third chapter explains the process and design of the robot prototype used for the two user studies and, moreover, describes the planning and the implementation of the three input modalities. Also some problems during the design process are described.
3 • The fourth chapter is about the two user studies (one in a public context, and one in a controlled laboratory setup) which were conducted in order to achieve the research goals and to answer the research questions. Firstly, the study setup is described, followed by the results of each study and a short summary of the most important findings.
• In the conclusion, the research questions are answered and discussed.
• The chapter about future work contains a short description about the prospective work, which could be performed in order to deepen the understanding of the impact of task complexity in HRI.
• In the appendix, three already accepted papers of this research, as well as all the questionnaires used in the studies, and the outputfiles of the data analysis are provided.
4 2. Introduction
The history of robots is a long and diversified one. (cf. [YVR11], [Bed64]) Already in the ancient world first experiments with machines were conducted. Around 420 B.C. Archytas of Tarentum [Arc05], a Greek mathematician, for example, created a wooden steam pro- pelled bird, which was able to fly and was storied to be the first known robot (depicted in Figure 2.1).
Figure 2.1.: Archytas of Tarentum’s Dove (mobile.gruppionline.org)
Similarly, Heron of Alexandria [Her99] described more than a hundred machines, for example automatic music machines or theaters in the first century A.D. and earlier.
With the downfall of the ancient cultures, scientific knowledge temporarily disappeared. About 1205 Al-Jazari, an Arabic engineer, wrote a book about mechanical equipment, for ex- ample humanoid automata or a programmable automaton band, which influenced Leonardo da Vinci in his research. Da Vinci [Dav06] designed a humanoid robot in the 15th century, but unfortunately, the technical knowledge was not sufficient enough to put such a machine into practice. However, some drafts of his mechanical knight (Figure 2.2) outlasted the centuries.
About 1740 Jacques de Vaucanson [Bed64] designed and implemented a machine, which could play a flute, a first programmable fully automatic loom, and an automatic duck, which can be seen in Figure 2.3 .
5 Figure 2.2.: Model of a Robot based on Drawings by Leonardo da Vinci (From Wikimedia Commons)
Figure 2.3.: Duck of Jacques de Vaucanson (From Wikimedia Commons)
6 1941 Isaac Asimov [Asi42], a science fiction author, firstly used the word “robotics” in one of his stories and assumed that robotics is referred to the science and technology of robots. He also became famous for his “three laws of robotics” which he introduced in his story “Runaround”.
1. “A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey the orders given to it by human beings, except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.”
He also added a fourth one, the “zeroth law”
0. “A robot may not harm humanity, or, by inaction, allow humanity to come to harm.”
After the Second World War, most likely because of the invention of the transistor by Bell Laboratories, robotics made rapid advances from the technological point of view.
According to Matari´c[Mat07] who defined a robot as follows:
“A robot is an autonomous sysytem which exists in the physical world, can sense its environ- ment, and can act on it to achieve some goals.” the first real robot was considered to be the “Tortoise” of Grey Walter, because it was the first machine which met the definition.
About 1970 the first autonomous mobile robot Shakey (Figure 2.4) was invented by the Stanford Reseach Institute.
In 1973 at the Waseda-University Tokio [Kat74] the construction of the humanoid robot Wabot 1 was started, and in the same year the German robotics pioneer KUKA built the worldfirst industrial robot known as FAMULUS [KUK73].
1997 the first mobile robot Sojourner landed on the Mars [Soj97].
Nowadays, robots’ domains are manifold. On the one hand there are highly specialized robots for industrial use. They often work in highly adapted environments, strictly separated from
7 Figure 2.4.: The First Autonomous Mobile Robot (www.frc.ri.cmu.edu) the humans’ workingspace. The main activities of industrial robots are often handcrafts like assembly, painting, inspection or transportation tasks. However, on the other hand also in this context, there is a trend to collaboration with human workers [Ana13]. As a consequence, suiting methods for human-robot communication need to be provided, which is one of the main research goals of this work.
Other application areas, for example, are service robots, which provide attendances often directly to humans, and have to orient themselves in contexts together with human interaction partners, which again shows the importance of ways to communicate with each other in order to avoid misunderstandings or dangerous situations.
Furthermore, the toy industry has robots in their portfolio, for example the robotic dog Aibo from Sony (Figure 2.5), or the Lego Mindstorms, which were used as prototype platforms for the research in this thesis.
Looking like a game, but with a serious scientific background, robot soccer games between teams, consist of the same type of robot, are conducted. The objective of this research is to develop an autonomous humanoid soccer team, which is able to play and win against the current world champion until 2050 [Cup13].
8 Figure 2.5.: Sony’s Robotic Dog Aibo (From Wikimedia Commons)
Robots also help humans doing the housework, for example by mowing the lawn, or by vacuum cleaning. Exploration robots investigate dangerous areas such as planets in a far distance, as well as disaster areas. In medicine, robots support surgery or rehabilitation or simple monitoring tasks in a hospital.
In the military context robots are used too, for example uncrewd drones for spying out dangerous areas.
The fact that in the future robots will affect our living conditions a lot more than today, stresses the importance of enabling a clear communication, without misunderstandings. Ad- ditionally, also Hoffmann and Breazeal [HB04] propose, that collaboration between humans and robots has and will become much more important. The research field of Human-Robot Interaction (HRI) has dealt with such research questions for several years now, but there are still many questions to answer. For example, one important aspect in the exploration of human-robot cooperation is to find out which input modalities to communicate with the robot are best suited [RBO09], [RDO11], [GS08] ; especially concerning different levels of task complexity.
One severe problem of Human-Robot Collaboration is often, that not an appropriate input modality is provided for the interaction with the robot. The challenge thereby is that the ideal input modality can change between different situations or various tasks. For example, in a noisy context, a speech control may not be the most suiting tool to interact with a robot, whereas in a quiet surrounding where movement together with the robot is necessary, it could
9 be the ideal choice. The ideal input modality is also dependant on the task which has to be fulfilled, for example if the user needs a hand free for a secondary task, a PC-remote control, where both hands are required for the interaction, would not be the optimal choice, whereas a gesture-based interface which requires just one hand for the interaction, could probably provide the functionality needed for such a situation.
In order to find the best input modality for collaboration, also the complexity level of the task plays an important role. An easy and superficial task like transporting a box straight ahead, could require other input possibilities, in comparision to a more complex task where more precision is needed like a surgery for example. As a consequence, findings of task complexity research were also included for the research in this thesis. This is essential in order to find the best mix of input modality in dependance of task complexity for human- robot collaboration.
2.1. Main Research Questions
Therefore, the main research goal of this thesis is to explore the ideal mix of input modalities depending on different levels of task complexities in terms of performance and user satisfaction measures. In other words, which input modality is the most appropriate for different levels of complexities. For example, a speech control system could possible be the best choice for easy tasks, whereas a gesture-based interface could be more appropriate for complex activities. Furthermore, it is investigated if minimal human-like cues, added to a purely functional robot in order to suggest anthropomorphism, have an effect on the interaction/collaboration.
2.2. Research Sub-Questions
1. What interdependencies between input modalities and task complexities can be iden- tified?
2. How do users perceive the different input modalities in terms of user satisfaction mea- sures?
3. Are the means of interaction provided by the different input modalities effective for the human and the robot?
10 4. Are the means of interaction provided by the different input modalities efficient for the human and the robot?
5. How do the different input modalities perform for different task complexities?
The research sub-questions are used in order to assess the main research goal and to provide a broader perspective about the research in this work. The research questions will be answered in Chapter 6.
However, the first step to reach the research goal, was an intense search for literature, which was used as a starting point for this thesis. The following chapter illustrates the most im- portant work, which is related to this area of research.
11 3. Related Work
In order to enable successful human-robot collaboration, many factors have to be consid- ered:
• The context in which the interaction takes place,
• the type of the robot,
• the experience of the user,
• the task which has to be fulfilled,
• and many other circumstances.
Especially the chosen input modality used for the cooperation is an essential aspect to en- hance a satisfying interaction as there are many different possiblities to interact with robots nowadays. Much research effort has already been spent on investigating in different ways for commanding robotic systems:
• Different handheld devices [RDO11],
• tangible interfaces [GS08] or
• or different control modalities at the same time [BGR+09].
Some researchers also investigate in using speech for commanding robotic systems [CSSW10] [AN04]. All in all, each input modality has advantages and disadvantages in specific situa- tions, therefore further investigation is necessary.
Cantrell et al. [CSSW10] for example worked on the issue of teaching natural language to an artificial system. Therefore they conducted a human-human interaction experiment with the
12 purpose to collect data about verbal utterances and disfluencies of natural language. They mentioned, that most of the speech recognition systems are based on a sequential approach and just make limited use of goal structures, context and task knowledge, which is indeed essential for achieving a natural language like interaction. They identified a number of prob- lems as disfluencies, omissions, grounding of referents and others which makes it complicated for an artificial system to understand natural language. Although they believed, that “no HRI architecture is currently capable of handling completely unrestricted natural language” they proposed a very promising integrated architecture for robust spoken instruction under- standing using incremental parsing, incremental semantic analysis, disfluency analysis, and situated reference resolution besides the “classical” speech recognition, with the result, that they were able to handle most of the above mentioned problems quite well. The authors con- sider their work to be ”an important step in the direction of achieving natural human-robot interactions”. All in all, their system can handle a wide variety of spoken utterances, but it is still not free of errors. Therefore, it was decided to use a sequential approach with strictly predefined commands for the speech recognition system in the user study in this thesis, to avoid some of the above mentioned problems, for example, the grounding of referents. Furthermore it is mentioned that just a small set of commands will improve the accuracy of speech recognition systems, which will totally meet the requirements for the studies in this thesis.
Ayres and colleagues [AN04] implemented a speech recognition system, which they studied by using Lego Mindstorm robots. They implemented a system, which could control a robotic device, with a JVM, via a Wifi/WLAN/TCPIP network from a PC workstation or a mobile handheld PC (iPaq). They mention, that traditionally speech recognition systems are mainly written in C, but due to recent advances in Java, especially in the Java Speech API, this programming language would also be suitable for speech and recognition engineering. In Figure 3.1 several implemented architectures are depicted. All of them have in common that firstly the voice signal is aquired and processed by a PC Workstation or the mobile handheld PC (iPaq), and secondly all of them are using different kinds of speech recognition systems like Sphinx 2 or the JSAPI Cloudgarden API. The aquired and recognised speech signals are sent to a Linux Server via WLAN then, which searches the grammar for possible commands and redirects it again via WLAN to the robot.
The results were quite satisfying, and they’ve proved their hypothesis, that Java has become a suiting tool for speech and recognition engineering, however, further implementation and testing work is required. Furthermore, they also mentioned the possibility of using Bluetooth for communication instead of WLAN, but due to the lower range and bandwith, and the higher difficulty to program, as Java APIs were not readily available, they claimed, that
13 Figure 3.1.: The Different Speech Recognition Architectures (cf. Ayres [AN04], S. 3)
WLAN “provides a more effective networking solution”. However, for the studies in this thesis, the range and bandwith is not very important, therefore it was decided to explore a similar speech recognition approach, combining the advantages of Bluetooth with the plattform independent Java programming language.
Obviously, speech is just an excerpt of the huge amount of possibilites to interact with robots. Other input modalities such as gesture or keyboards etc. have shown promising results, too. Rouanet and colleagues [RBO09], for example, compared three different handheld input devices, in order to control Sony’s Aibo robot and to show it different objects. For this, they inspected a keyboard-like and a gesture-based interface on the iPhone, and furthermore a tangible gesture-based interface using Nintendo’s Wii. The reason why they chose the AIBO robots was that “Due to its zoomorphic and domestic aspect, non-expert users perceived it as a pleasant and entertaining robot, so it allows really easy and natural interaction with most users and keeps the study not too annoying for testers”. They conducted an experiment with non-expert users in a domestic environment, where participants had to control the robot through two tracks, which can be seen in Figure 3.2, which differed in complexity. The results showed, that all input modalities were rather equally efficient and all considered as satisfying. Especially the iPhone gesture-based interface was preferred by the users, whereas the Wiimote was rather poorly rated, although especially for the hard course, the mean completion time was slightly lower with it. According to the authors, this advantage could be explained by the user’s ability to always focus on the robot. All in all, their input modalities suffered from a little delay, which means that the reaction of the robot took a few milliseconds, until it actually happened, after a command was given. Therefore, they added visual feedback in
14 Figure 3.2.: The Two Obstacle Courses, the Dotted Line is the Hard One, the Other Line the Easy One (cf. Rouanet [RBO09], S. 5) form of lights on the head of the robot, to add an abstraction layer to help users to enable a comparison of the interfaces themselves, instead of their underlying system.
Rouanet et al. [RDO11] continued their research and applied a real world user study with the aim, to teach robots new visual objects. Three input devices based on mediator objects with different kinds of feedback (Wiimote, Laser Pointer, and iPhone) where used. Furthermore, a gesture-based interface was simulated with a Wizard-of-Oz recognition system, to provide a more natural way of interaction. The iPhone interface was based on a video stream of the robot’s camera, which allowed participants to monitor the view of the robot directly. However, it was mentioned that the splitting of direct and indirect monitoring of the robot increased the user’s cognitive load. The Wiimote interface was based on the directional cross to move the robot. The laser pointer interface worked with the light to draw the robot’s attention directly in one direction. Both of the interfaces (Wiimote and Laser Pointer) allowed participants to directly focus their attention on the robot. The gesture-based interface, using a Wizard-of-Oz framework, provided the possibility to guide the robot by hand or arm gestures (Figure 3.3). They designed their study as a robotic game, with the purpose to maintain the users’ motivations. During their experiment, they found out that although the gesture- based interface had a lower usability, it was more entertaining for the users. It also increased the feeling of cooperation between the participants and the robot, whereas the usability was
15 Figure 3.3.: Robot Guided by Gestures (cf. Rouanet [RDO11], S. 4) better when using the iPhone interface, especially for non-expert users, which could possibly be because of the visual feedback of the device. All in all their work showed that although feedback increases usability, it could be negative for the cognitive workload on the other hand. Due to the fact that the gesture-based interface, which had the lowest usability was considered to be the most entertaining modality for participants, which could offer a better overall user experience, further investigation in this kind of input modality makes sense.
Cheng Guo et al. [GS08] suggested a tangible user interface for human-robot interaction with Sony’s robotic dog AIBO. They used two input devices, namely a keypad interface and a gesture based interface with a Nintendo Wii controller. They believed, that a more natural way of interaction can be achieved by allowing users to focus their attention on more high level task planning in comparison to low level steps. Therefore a suitable input device is needed. They conducted a study, comparing the gesture-based interface and the keyboard interaction device, solving two different tasks with two difficulties each:
• Navigation task: Participants were asked to navigate the AIBO robot through two obstacle courses. For the easy task, participants could freely choose the combinations of actions they needed. When solving the hard task, they were forced to use rotation and strafing, in addition to walking and turning, for a successful completion of the course.
• Posture task: Users had to perform a number of different postures with the forelegs of the robot (cf. Figure 3.4). In this case, only one foreleg had to be manipulated, whereas for the hard tasks, both forelegs had to be transformed, in order to complete the transition.
One of their research questions was, if gesture-based input methods could provide a more efficient human-robot interaction, in comparision to more conservative devices like keyboard, mouse or joysticks. They mentioned, that the most important advantage of their suggested
16 Figure 3.4.: Possible Postures for each Foreleg of the AIBO Robot (cf. Guo [GS08], S. 6) tangible user interface was that it provided affordances of physical objects. This advantage was explained because participants “do not need to focus on their hands while performing a posture. They are naturally aware of the spatial location of their hands.” The results of their user studies showed, that the gesture-based input outperformed the keyboard device in terms of task completion times in both types of tasks, and that it was a more natural way of interaction. The fact, that the keyboard interface required more attention shifts between the robot and the device, is another argument for the Wii control. Although most of the users mentioned, that they are more familiar with the keyboard - due to their computer game experiences - they would prefer the gesture-based interface because of the higher intuitiveness. On the other hand, it was mentioned, that if there had been a training session for both devices, the keypad would have outperformed the gesture-based interface. Furthermore, the authors argued, that their results are not always transferrable for these types of tasks, because they did not use the most intuitive mapping of keys on the keyboard, as in Figure 3.5, which could have effected their data.
Figure 3.5.: The Mapping of the Keys (cf. Guo [GS08], S. 6)
Thus, that participants believed, that a training session would have strengthened the results of the keypad interface. Furthermore the statement that not the most intuitive mapping of keys on the keyboard was used, and the fact that some kind of a tangible user interface (the Wiimote) allowed users a more high-level task planning, leaded to investigate in that kind of input modalities further in the studies.
17 Bannat et al. [BGR+09] proposed that a more natural and flexible human-robot interaction could be achieved by providing different control modalities at the same time, namely speech, gaze and so-called soft buttons. So users can choose, which modality suits best in a specific situation, for example to use speech when both hands are needed for an assembly step (Cf. Figure 3.6). They conducted a study in a factory setup: The worker had to solve a hybrid assembly scenario, in which the assembly steps should be solved in collaboration with the robot serving as an assistant or also as a fully autonomous assembly unit. The aim of their research is to “establish the so-called Cognitive Factory as the manufacturing concept for the 21st century.” Their approach will be evaluated in experiments with humans in their future work.
Figure 3.6.: Three Independent Communication Channels Between Human and Robot (cf. Bannat [BGR+09], S. 2)
However, this concept has not been experimentally validated so far and the possibility of using more than one input modality at the same time could possibly lead to a higher cognitive workload.
Hayes et al. [HHA10] tried to develop an optimized multi-touch interface for human-robot interaction, using a map on a screen to control the robot. According to the researchers, static input devices like mouse and keyboard do often not result in a satisfying interaction form to command robots, especially in situations, where the user needs hands free for a secondary task, or if it is necessary to walk around together with the robot. Their multi- touch interface resulted in a better usability, faster completion times and in a lower overall workload and reduced frustration, in comparision to static input devices like mouse and keyboard. They believed, that their work was just a first step to developing a successful
18 multi-touch interaction paradigm for human-robot interaction and they are willing to develop additional touch capabilities. They argued, that it is essential to evaluate their interface with participants, who are on the move during the interaction. Their work leads again to take more flexible input devices like a gesture-based mobile interface or a speech control system into account. To evaluate their usefulness, it is inevitable to compare them to static input modalites, such as a PC-remote.
Another approach was investigated by Pollard and colleagues [PHRA02]. They evaluated the possibility of using pre-recorded human motion and trajectory tracking as an input modality for anthropomorphic robots. They used a system with several cameras to gather data about the participants’ gestures, and conducted an experiment with professionally trained actors to capture their motions, in order to transfer them to a Sarcos robot like in Figure 3.7. They came to the conclusion, that the biggest limitation of their approach was that the degree of freedom of each joint was inspected separately. This sometimes led to situations, where one joint exceeded its limit and another did not, which produced results, that the motion of the robot did not match the motion of the actor. They planned to make a second experiment, where they want to show participants different instances of actors, performing given motions and let the participants decide, whether the robot motion has been driven from the actors motion or not. An evaluation will show, if the robots motions have been retained successfully from the actors’ movement.
Figure 3.7.: The Sarcos Robot and a Professional Actor (cf. Pollard [PHRA02], S. 1)
On the one hand, the approach seemed to be very innovative and promising, on the other hand, it seems to take a lot of time to record and process human motions, which is furthermore a very complex activity. Therefore, this kind of interaction paradigm will not be used in the studies.
Although there has been a lot of research done in human-robot collaboration and especially on how to control robots, there are still many things to discover. Summarizing, many researchers mentioned that speech is the most natural way of interaction between humans, therefore, it could possibly be the most suiting control method for robots too. Several experiments on
19 speech as an input modality for robots achieved quite positive results [AN04] [CSSW10] [WBS+09] [BGR+09], but the studies and measures were often not very detailed or not tested in a complete user study. Therefore further investigation makes sense, especially in collaboration scenarios where increasing usability and user experience could provide an immense outcome for companies in terms of time, money and overall satisfaction of the workers. For the studies presented in this thesis, it is not needed to implement an interface, which parses natural language, but just about five commands have to be understood, which increases the robustness of such a system a lot. Understandably, there have to be comparative input systems for the purpose to find arguments, whether a given input device is suitable for different scenarios like superficial or more complex tasks. The other two input modalities, which are investigated, are an Android application as a gesture-based interface and a PC remote control. All in all three very different devices, that offer the possibility to find well- founded arguments for comparing them, are investigated. Furthermore, it is a fact, that also different appearances of robots could have an impact on the results, like in the work of Groom et al. [GTON09]. As a consequence this factor will also be taken into account.
All in all, what has been - up to the literature review - left out so far, is a structured investigation of the interplay between: (1) input modality, (2) task complexity, and (3) appearance of the robot (functional vs. human-like). Which means that it is essential, not to investigate these factors just separately but combined in stuctured user studies. Therefore, especially interdependencies between them are in the focus of this research. The goal is to find the most appropriate mixture of input device and appearance of robots for different levels of task complexity, in terms of the resulting user satisfaction and the overall performance.
In order to assess these interdependencies in user studies, a robot prototype, as well as various input modalities to study were needed. The design as well as the implementation of the prototype and it’s input modalities are described in the following chapter.
20 4. Prototype
4.1. The Purely Functional Robot
For the research presented in this thesis a robot was needed, which is able to fulfill pick and place and transportation tasks. These kind of jobs are often common in the context of a factory for example and needed for collaborative interaction scenarios. It was decided to use Lego Mindstorms NXT 2.0 [Wik13b], which were launched on August 5 2009, due to several reasons.
The Lego Mindstorms were developed through a partnership between Lego and the MIT Me- dia Laboratory, also with the goal, to be used, besides as a toy, as teaching material for pro- gramming. It is a good example for an embedded system, which consists of a programmable brick computer (with the possibility to use four sensors and three motors concurrently, with a 32-bit microprocessor), three servomotors, different sensors (like an ultrasonic sensor to measure distances, a light sensor to gather information about six different colours, or the intensity of light, 2 touch sensors which react to button presses, a sound sensor which enables the robot to receive information about noise, and many more), and Lego technique parts. (cf. Figure 4.1) Furthermore, the robot is able to provide aural feedback by a 8-bit speaker and also visual output with the aid of a 100x64 pixel display.
It is possible to copy nearly any thinkable mechanic system and also to construct autonomous or interactive systems with Lego Mindstorms, which led 2008 to the acceptance for the robot hall of fame [Wik13c]. In addition to that, it is a more or less affordable robot, also available for the consumer market. Although it is primarily a toy, it provides the freedom to use nearly any of the most common programming languages, for the purpose to write routines for the robot. Summarizing, the flexibility, affordability, and quality has led to the decision to use Lego Mindstorms for this research.
A first functional prototype was constructed, which was able to drive around by using two
21 Figure 4.1.: The Programmable Brick Computer with Motors and Sensors Attached (cf. Lego.com [Leg13]) motors. The problem appeared, that there was just one motor left to enable the robot to pick and place things, which was essential for the user studies. As a consequence, an intensive search for literature and ideas followed, with the discovery of “The Snatcher” of Laurens. It is a functional robot with a grabbing arm, which was an inspiration for using this grabbing arm for the prototype too. In his book [Val10], Lauren provides a detailed description how to build this grabbing arm, with just one motor needed. This becomes possible by a clever combination of gears, beams, and transmission of power. The arm worked well, but after a few tests, strange noises were noticed, produced by the motor of the grabbing arm, when reaching the highest or lowest position of the arm. This was caused by the attempt of the motor, to overcome the resistance of physical barriers. In order to avoid damages to the motor, a minimum and a maximum point were defined, to restrict the movement of the grabbing arm motor. This was carried out by using the ultrasonic sensor and one touch sensor, to indicate the respective end points (cf. Figure 4.2). For example, when the ultrasonic sensor reported a distance lower than 1cm, the grabbing arm was on the lowest allowed position, and the motor should stop. On the opposite side, the touch sensor was pressed, when the arm reached the highest position and triggered the arm to stop.
Furthermore a sound sensor was added to the robot, to enable the robot to receive information about noise and of course sound commands. Unfortunately, the sound sensor was only capable of measuring the volume, and it was not managed to record actual sounds with it. The result of the prototyping process led to a purely functional robot design, which can be seen in Figure 4.3.
After a few short tests it was realised that the road grip was not adequate enough, as a consequence it was hard to precisely control the robot. This problem led to the decision, to change from wheels to chains which improved the handling a lot.
22 Figure 4.2.: The Ultrasonic Sensor Indicates the Minimum Point, the Touch Sensor the Max- imum Point
Figure 4.3.: The Functional Prototype with the Grabbing Arm
23 4.2. The Anthropomorphic Robot
In order to get an insight into the impact of the appearance of the robot on the interaction, it was investigated, if minimal anthropomorphic cues, added to the robot, would lead to differences in the results. Therefore, the appearance of the robot was altered by adding a head to the purely functional robot, to indicate anthropomorphism. The question was, if minimalistic cues are sufficient to effect the results concerning user satisfaction measures, or even the performance. The head was an outcome of a 3D-printing workshop, which took place at the Computer Science Departement of the University of Salzburg and was provided by a member of Otelo [Ote13]. In the workshop, which lasted two days, three fully working 3D printers were built and tested and for the head of the robot a model, designed by Neophyte, was used.1 The origin of the head can be seen in Figure 4.4.
Figure 4.4.: The 3D Printer is Processing the Robot’s Head
The head was then mounted on the functional prototype by using some lego bricks. In Figure 4.5, a first concept and the prototype with anthropomorphic cues is depicted.
4.3. The Input Modalities
After the robot prototype was finished, three different input modalities were implemented. All of them should have been able to control the robot to drive around in any desired direction, and furthermore to lift and release small objects by using the grasping arm of the prototype.
1http://www.thingiverse.com/thing:8075
24 Figure 4.5.: A first Concept and the Prototype with Anthropmorphic Cues
Following the literature review in Chapter 3, it was concluded to provide
• a gesture-based interface on an Android mobile phone,
• a speech control system,
• and, last but not least a “classical” PC-remote interface, using a standard keyboard for the interaction with the robot.
In the following section, an overview about how the different input modalities were imple- mented, and what and especially why some technologies were used, is given.
4.3.1. The Mobile Gesture-based Interface
During the conception of the different input possibilities, the MINDdroid application was conquered, which offered nearly all of the functionalities, which were needed for the gesture- based interface. The executable and also the source code is freely available under the GPL v3 license 2. The source code can be downloaded at the github webpage [H13].¨
The robot is controlled by using the data of the accelerometer sensor of the smartphone, in other words, the direction of the robot is triggered by the relative angle of the mobile to the
2http://www.gnu.org/licenses/gpl-3.0.html
25 ground. By using this data, it is possible to control two motors of Lego Mindstorm robots. Furthermore, the application provides visual feedback on the screen and haptic feedback by vibrations. The application also provides the possibility to start programs, which are directly installed on the Lego Mindstorm using the standard firmware or others. This happens by using the touchscreen interface via the action button. As it can be seen in Figure 4.6, the relative position of the yellow square indicates the direction, which the robots is moving to. If the yellow square is within the grey one, it stops. For the communication to the robot and sending the commands, Bluetooth is used.
Figure 4.6.: The MINDdroid Application (cf. github.com [H13])¨
In order to allow a more exact comparision between the different input modalities, the source code was used and modified to meet specific requirements. The modification was covering a limitation of the maximum speed in order to make it equal to the other input modalities - the speech control interface and the PC-remote control. Performance measures would have had hardly any value, if one input modality allows the robot to go faster or slower, than the others.
Due to the fact, that a Samsung Galaxy S2 mobile phone was used for the studies, the app was compiled for Android 4.1 Jelly Bean with API level 16 [And13], which was at that time the newest version. For all the programming work, the integrated development environment Eclipse 4.2.0 Juno [Ecl13] and the ADT plugin [ADT13] was used, which provides an integrated environment for developers of Android applications within Eclipse.
At this stage, the robot was just able to drive around, therefore, the functionality for the grasping arm was added. For this purpose two small routines were implemented, namely to lift and release the grasping arm. For the routines, the programming language NXT-G, which was provided by the NXT 2.0 standard base toolkit, was used. It features programming by providing an graphical and interactive drag and drop environment, which can be seen in
26 Figure 4.8. NXT-G is based on LabVIEW, an industrial programming standard, which was created by National Instruments using data flow programming (cf. [Wik13b]). The programs ran directly on the Mindstorm robot, and no change of the standard firmware was necessary, which was indeed another argument for using the NXT-G language.
4.3.2. The Speech Control
Several reasons led to the conclusion, that further investigation in speech control was promis- ing. According to the work of Hearst et al. [Hea11], users prefer to speak rather than type. A good interface can bridge the gap in the back end, which makes systems more usable for less experienced people. Additionally, such an input modality provides the advantage, that it is possible to move along with the robot for a better supervision.
When it came to the implementation of speech control, the first idea was to implement the system directly on the robot, in order to avoid the necessity of other equipment. The flexibility of a directly embedded speech control seemed to be a high advantage, therefore, the soundsensor was attached to the robot and programming with NXT-G was started. The sound sensor in Figure 4.7 is, according to the manifacturer, able to detect noise levels in decibels (frequency range from 3 to 6 khz), as well as to identify sound patterns and tone differences (cf. Lego Shop [LSh13]).
The robot should “understand” 7 commands, turning in each direction, stop, and lift or release the grasping arm. Following Alan Dix [DFAB98], humans are just able to remember 7+-2 chunks of information, due to the limited capacity of short term memory. As a consequence it was decided to use the same number of commands for the robot, which was colloquial sufficient to perform the required navigation and grasping tasks.
Firstly thresholds were defined. The noise values were divided from 1 (silent) to 100 (loud) in order to strain off some of the background noise. Everything more quiet than a noise value of 20 was identified as background noise, and therefore ignored. Noises between 20 and 59 were recognized as sound commands, and for security reasons, any sound louder than 60 triggered the stop command on the robot. This was implemented because in a noisy context the robot could become probably uncontrollable and as a consequence the safest possibility for the robot as well as for the user is to stop the robot automatically.
When the sensor detected a sound within the volume level of sound commands, the robot recorded the sound for half a second, which means that for each millisecond the sound level
27 Figure 4.7.: The Lego Mindstorms Sound Sensor (cf. [LSh13]) was stored. The outcome was a graph, where the x-value indicated the time and the y- value the noise volume level. By analyzing the peaks of the graph, a differentiation of the sound commands became possible pretty exact. After the result of the analysis of the sound command the respective action on the robot was triggered. After the action was started, the programm started again because of an infinite loop, the whole programm was included.
All in all, the speech recognition worked quite well, but unfortunately just for controlling the robot forward, left, right, and to stop. When it came to adding more commands, which were needed for the grasping arm, the accuracy of the speech recognition decreased a lot. A small excerpt of the NXT-G program can be seen in Figure 4.8, just to get an idea of how an NXT-G program looks like.
As a consequence it was decided to try an alternative approach, namely a program based on Java on a laptop, which meant on the one hand giving up the advantage of a directly embedded system on the robot, but on the other hand provided much more possibilities in programming, and hopefully a higher accuracy and robustness of the speech recognition system. A divison of the interaction between the user and the robot into necessary steps was made from the speech control system’s point of view:
1. Aquire the spoken signal
2. Process and recognize the signal
3. Transform the recognized token into a command, the robot is able to handle
4. Send the command to the robot
28 Figure 4.8.: An Excerpt of the First Speech Recognition Approach written in NXT-G
5. Start again with 1.
Step 1 was simply done by using the built-in microphone of a Lenovo Thinkpad R61 Note- book, and the standard Windows microphone. The speech signal was then assigned to the Java programm, where it was processed in Step 2 by a Speech recognition framework.
There are several open-source speech recognition engines available on the Internet, but it was decided to use Sphinx [Sph13], which was released under the BSD style license by the Carnegie Mellon University and others, and seemed to be very promising [WLK+04].
Sphinx is available in different versions:
• Sphinx 2 is a speaker-independent, real-time large vocabulary especially for mobile applications based on the C programming language.
• Sphinx 3 is more accurate than Sphinx 2, but needs more ressources; also based on C.
• Sphinx 4 is a Java version of the toolkit, therefore, the first choice.
• Pocketsphinx, C-based recognizer library, especially suited for modern smartphones, and newest Sphinx release.
• there are some more versions available which are not further discussed in this thesis.
29 In 2004 Sphinx 4 was currently in development and suffered many compile errors and no doc- umentation was provided. As a consequence Ayres et al. [AN04] failed to achieve satisfying results. In the last years, Sphinx 4 has made huge advances, hence it was reasonable to use it for the speech recognition system.
The speech recognition relies on a BNF-style grammar3 in the Java Speech API Grammar Format (JSGF) [JSG13], a platform-independent grammar in textual form, expecially for the use in speech recognition and engineering systems. In speech recognition systems, this grammar needs to be specified, in order to determine, what the system should be capable to understand, and of course to describe the utterances users may and should use. Due to the fact, that 7 commands were necessary, a simple JSGF compatible grammar for the commands “forward”, “backward”, “pause”, “lift”, “release”, “turn left” and “turn right” was created:
#JSGF V1 . 0 ;
/∗∗ ∗ JSGF Grammar for commanding the robot to move ∗ around and to lift and release objects ∗/ grammar commandRobot; p u b l i c
At the beginning an experimentation with a few possibilities for the words was done. Within the grammar, example given “stop” instead of pause, or “go forward” instead of “forward”, was defined, but it was concluded, that the accuracy of the speech recognition was best with the words used in the end. The outcome of Step 2 was a string, which matched one of the commands which were defined in the grammar.
This particular string was used to decide, which command should be triggered on the robot in Step 3. To this end, the program applied the Icommand API v.0.7 [Ico13], which offers the possibility of using Bluetooth for communication with the robot. For completing Step 4, Icommand provides several possibilites for Bluetooth communication with Lego Mindstorms, like RXTX, Bluez or Bluecove. The advantage of RXTX is, that it should be possible, to simply adress the robot by the com port (e.g. COM4). Unfortunately the connection
3http://en.wikipedia.org/wiki/Backus-Naur Form
30 sometimes got lost during the interaction due to unknown reasons. As a consequence, it was switched to BlueCove [Blu13], a Java library which uses directly the Microsoft Bluetooth stack, which worked out really well.
Summarizing, the system consisting of a notebook and the Lego Mindstorm robot, offered the possiblity to control the robot by speech commands.
4.3.3. The PC-remote Control
The third input modality which was implemented, in order to compare them in a human- robot collaboration task, was a “classical” PC-remote, using the keyboard on a notebook, which was a coproduct of the speech control system and also written in Java. During the development of the speech control, a key-listener was implemented for testing and debugging reasons, which was also suitable for the PC-remote. The communication to the robot worked in exactly the same way as in the speech control modality, in other words the Icommand API v.0.7 [Ico13] for parsing the command understandable for the robot, and furthermore Bluecove [Blu13] for the transmission of the command. The only differences were:
• Commands were not triggered by speech, but by button presses on the keyboard.
• Buttons needed to be kept pressed, in order to make the robot move. When the keys were released, the robot stopped, unlike, by using the speech modality, it moved, until it actually received the stop command. For this functionality the functions keyPressed() and keyReleased() from the type keyevent were used:
public void keyPressed(KeyEvent e) { int kc = e.getKeyCode();
switch ( kc ) { case java.awt.event.KeyEvent.VK UP: command = COMMANDFORWARDS; break ; case java.awt.event.KeyEvent.VKDOWN: command = COMMANDBACKWARDS; break ;
31 case java.awt.event.KeyEvent.VK LEFT: command = COMMAND LEFT; break ; case java.awt.event.KeyEvent.VK RIGHT: command = COMMAND RIGHT; break ; case java.awt.event.KeyEvent.VK G: command = COMMANDGRAB; break ; case java.awt.event.KeyEvent.VK R: command = COMMAND RELEASE; break ; d e f a u l t : command = COMMANDNONE; break ; } }
public void keyReleased(KeyEvent e) { command = COMMANDNONE; }
For directing the robot, the arrows on the keyboard were used, whereas for the grasping arm the buttons G (“Grab”) and R (“Release”) were chosen.
This chapter explained the process and design of the robot prototype which is needed for the two user studies and, moreover, described the planning and the implementation of the three input modalities. Also some problems during the design process were described, such as the capability of the sound sensor or the problem with the Bluetooth communication using RXTX. In the following chapter, the two user studies (one in a public context, and one in a controlled laboratory setup) which were conducted in order to achieve the research goals and to answer the research questions are illustrated. At first, the study setup is described, followed by the results of each study and a short summary of the most important findings.
32 5. User Studies
To enable a better understanding of human-robot interaction/collaboration, especially about interdependencies between:
• different input modalities,
• different task complexities,
• and different appearances of the robot prototype, two user studies were conducted. The aim was to identify, which combination of input modal- ity, and appearance of the robot works best in terms of performance and user satisfaction for different human-robot collaboration tasks, which differ in complexity. This knowledge would make it possible to define suggestions and guidelines in terms of suitable input modalities for specific types of tasks and difficulty levels. During the two studies, users were invited to solve different tasks together with the robot.
• At the beginning of June 2012 during the 50th anniversary of the Paris Lodron Uni- versity of Salzburg, at the ICT&S Center, a preliminary study was conducted, were participants were invited to compete against each other in a Lego Mindstorm race.
• Furthermore, in a controlled laboratory study, users had to perform a collaborative task of building a house out of Lego bricks, together with the robot.
5.1. Preliminary Study in a Public Context
In order to explore how input modality, task difficulty, and the appearance of the robot interplay in terms of performance and user satisfaction, a user study was set-up at the ICT&S Center, during the 50th year anniversary of the Paris Lodron University of Salzburg.
33 [Results of the preliminary study have already been published as a workshop position paper [SWT12] and also a late-breaking report [SWT13b] was successful submitted and accepted. The publications can be found in Appendix A.1 and Appendix A.2 ]
5.1.1. Methodology
For the purpose of exposing errors in the study setup and to refine the tasks, a pilot study was arranged with volunteers from the ICT&S Center. Due to the fact that the pilot study was quite successful, in other words, no severe flaws were identified in the study-setup, a study was arranged, in which visitors competed against each other in a Lego Mindstorm race, which was set-up as a 2x2 between-subject design.
The conditions were:
• Task difficulty (Easy / Hard)
• Used input modality (Gesture-based interface / PC-remote control)
Participants firstly had to sign an informed consent [which can be found in Appendix A.3] , in order to give their allowance to use the outcoming data for scientific purposes. After that, they were provided with a short description of the input modalities and the task they had to fulfill. The overall goal was to navigate their robot through a race track, which was filled with obstacles, which the participants had to avoid. There were always two equal tracks in parallel, one for each person. At the end of each track, a Lego box was placed, which had to be transported to the starting point (cf. Figure 5.1).
One participant used the gesture-based interface, whereas the other one the PC-remote con- trol, which offered the possibility to compare the two input devices in terms of efficiency and effectiveness. The participant, who managed to transport the box with the help of the robot first, was the winner of the race. In order to gather data about user satisfaction measures, participants were asked to fill in a questionnaire after the race.
During the day the difficulty of the tracks was changed from easy to hard by adding more obstacles to the track, in order to gain insights on the impact of different task complexities (cf. Figure 5.2).
34 Figure 5.1.: The Two Tracks with the Mindstorm Robots
Figure 5.2.: The Easy (left) and the Hard Track (right) was Different in the Number and Placement of Obstacles
5.1.2. Measures
The measured information was driven from the EN ISO 9241-11 Standard [ISO99] which defines three main measures for usability, also proposed by Jakob Nielsen [Nie01].
Based on a further literature review the measures were divided into two categories.
35 5.1.2.1. Performance Measures (Behavioral Level)
In order to evaluate the potential of productivity with the different input modalities we measured performance concerning two factors:
• Efficiency - time, participants needed to accomplish the course from starting to the end point.
• Effectiveness - determined by counting the number of errors during the navigation based on collisions with other objects or barricades.
5.1.2.2. User Satisfaction Measures (Attitudinal Level)
Besides performance also the attitudinal level is important to strenghten the results of the studies. Therefore, several factors were assessed:
• The perceived task complexity, in order to check, if the manipulation of the track difficulty was successful
• Acceptance, intuitiveness and satisfaction of the used device
The user satisfaction measures were captured by a questionnaire, which was inspired from a survey which was used by a colleague [MBF+09] for the purpose to evaluate the intuitiveness of the Nabaztag robot as an ambient interface. On the other hand, the USUS evaluation framework [Wei10] was used as a source for the questionnaire, which consisted of 15 questions with a 5-point Likert scale. Table 5.1 shows the item of the questionnaire concerning the perceived task complexity. The participant had to check one of the boxes which ranged from “not at all” to “very”; the last option “k.A.” meant no answer. Furthermore, participants where asked why they decided their rating, indicated by the line under the checkboxes.
Table 5.1.: The Item “Perceived Task Complexity” with a 5-point Likert Scale
36 The questionnaire consisted of:
• 1 item for perceived task complexity
• 5 items for intuitiveness
• 4 items for satisfaction
• And 5 items for the acceptance scale
Apart from the user satisfaction measures, the survey also contained information about de- mographic data, like gender or age, and furthermore which device was used. The survey was filled in within the study area to avoid the participants to be distracted from the change of the context, which could possibly influence the results of the questionnaire. The full questionnaire can be found in Appendix A.4.
Moreover, the study was videotaped, in order to recheck the measured time and number of collisions as well as to identify observable problems with the input devices.
5.1.3. Results
Although it was a challenge to conduct a user study in such an open space context, because of the frequent change of visitors, it offered - on the other hand - the opportunity of studying lots of people with varying socio-demographic background in a short time.
All in all, there were 24 participants participating in the study. 10 of them were female, and 14 male. The youngest participant was 11 years old, whereas the oldest one was 66 years old. The mean age was 36.73 yeas with a standard deviation of 14.63.
For the data analysis, 4 data sets were excluded because of two reasons:
• 2 participants passed the race without an opponent, with the consequence, that there was no real race condition. This would likely have effected the results, therefore these two participants were ignored for data analysis.
• Furthermore, a woman did not want to insult her son, therefore, she let him win the race. The two data records were also removed because that situation would have influenced the data as well.
37 5.1.3.1. Performance Measures (Behavioral Level)
To begin with, it was checked, if the task complexity was successfully diversified by the placement and number of obstacles, in other words, if the hard track was really more complex, than the easy one. For this purpose, a Mann-Whitney-U test on the number of collisions concerning the different task complexities was run. The test revealed, that the number of collisions was significantly higher for the more complex track, z = 2.899, p = 0.004. The mean rank for the hard task was 13.50, whereas for the easy one 6.00. Also the track solution time was significantly different for the two tracks, z = 2.162, p = 0.031. The average rank for the easy tasks was 7.00 and for the hard one 12.83. The absolute means concerning the performance can be seen in Table 5.2
Table 5.2.: Efficiency and Effectiveness in Dependency of the Track Difficulty