Exploring Miscommunication and Collaborative Behaviour in Hu- man-Robot Interaction

Theodora Koulouri Stanislao Lauria Department of Systems and Department of Information Systems Computing and Computing Brunel University Brunel University Middlesex UB8 3PH Middlesex UB8 3PH [email protected] [email protected]

and linguistic domain of the robot. Physical co- Abstract presence will lead people to make strong but misplaced assumptions of mutual knowledge This paper presents the first step in de- (Clark, 1996), increasing the use of underspeci- signing a speech-enabled robot that is ca- fied referents and deictic expressions. Robots pable of natural management of mis- operate in and manipulate the same environment . It describes the methods as humans, so failure to prevent and rectify errors and results of two WOz studies, in which has potentially severe consequences. Finally, dyads of naïve participants interacted in a these issues are aggravated by unresolved chal- collaborative task. The first WOz study lenges with automatic speech recognition (ASR) explored human miscommunication technologies. In conclusion, miscommunication management. The second study investi- in HRI grows in scope, frequency and costs, im- gated how shared visual space and moni- pelling researchers to acknowledge the necessity toring shape the processes of feedback to integrate miscommunication in the design and communication in task-oriented inte- process of speech-enabled robots. ractions. The results provide insights for the development of human-inspired and 1.2 Aims of study robust natural interfaces in ro- bots. The goal of this study is two-fold; first, to incor- porate “natural” and robust miscommunication 1 Introduction management mechanisms (namely, prevention and repair) into a mobile personal robot, which is Robots are now escaping laboratory and indus- capable of learning by means of natural language trial environments and moving into our homes instruction (Lauria et al., 2001). Secondly, it and offices. Research activities have focused on aims to offer some insights that are relevant for offering richer and more intuitive interfaces, the development of NLIs in HRI in general. This leading to the development of several practical research is largely motivated by models of hu- systems with Natural Language Interfaces man communication. It is situated within the lan- (NLIs). However, there are numerous open chal- guage-as-action tradition and its approach is to lenges arising from the nature of the medium explore and build upon how humans manage itself as well as the unique characteristics of miscommunication. Human-Robot Interaction (HRI). 2 Method 1.1 Miscommunication in Human-Robot Interaction We designed and performed two rounds of Wiz- ard of Oz (WOz) simulations. Given that the HRI involves embodied interaction, in which general aim of the study is to determine how ro- humans and robots coordinate their actions shar- bots should initiate repair and provide feedback ing time and space. As most speech-enabled ro- in collaborative tasks, the simulations departed bots remain in the labs, people are generally un- from the typical WOz methodology in that the aware of what robots can understand and do re- wizards were also naive participants. The domain sulting in utterances that are out of the functional of the task is navigation. In particular, the user

Proceedings of SIGDIAL 2009: the 10th Annual Meeting of the Special Interest Group in and Dialogue, pages 111–119, Queen Mary University of London, September 2009. c 2009 Association for Computational

111 guided the robot to six designated locations in a right corner of the screen that displayed the cur- simulated town. The user had full access to the rent surrounding area of the robot, but not the map whereas the wizard could only see the sur- robot itself. Then, for the purposes of the second rounding area of the robot. Thus, the wizard re- study, this feature was removed (see Figure 1 in lied on the user’s instructions on how to reach Appendix A). the destination. In this section we outline the aim and approach of each WOz study, the materials used and the experimental procedure. Sections 4 and 5 focus on each study individually and their results. 2.1 The first WOz study This study is a continuation of previous work by the authors (Koulouri and Lauria, 2009). In that study, the communicative resources of the wizard were incrementally restricted, from “normal” dialogue capabilities towards the capabilities of a dialogue system, in three experimental condi- tions: Figure 1. The user’s interface.  The wizard simulates a super-intelligent The wizard’s interface was modified accord- robot capable of using unconstrained, ing to the two experimental conditions. For both natural language with the user (henceforth, conditions, the wizard could only see a fraction Unconstrained Condition). of the map- the area around the robot’s current  The wizard can select from a list of de- position. The robot was operated by the wizard fault responses but can also ask for clarifi- using the arrow keys on the keyboard. The dialo- cation or provide task-related information gue box of the wizard displayed the most recent (henceforth, Semi-Constrained condition). messages of both participants as well as a history of the user’s messages. The buttons on the right  The wizard is restricted to choose from a side of the screen simulated the actual robot’s limited set of canned responses similar to ability to remember previous routes: the wizard a typical spoken dialogue system (SDS). clicked on the button that corresponded to a The current study investigates the first two con- known route and the robot automatically ex- ditions and presents new findings. ecuted. In the interface for the Unconstrained condition, the wizard could freely type and send 2.2 The second WOz study messages (Figure 2). The second round of WOz experiments explored the effects of monitoring and shared visual in- formation on the dialogue. 2.3 Set-up A custom Java-based system was developed and was designed to simulate the existing prototype (the mobile robot). The system consisted of two applications which sent and received coordinates and dialogue and were connected using the TCP/IP protocol over a LAN. The system kept a log of the interaction and the robot’s coordinates. Figure 2. The wizard’s interface in the Uncon- The user’s interface displayed the full map of strained condition. the town (Figure 1). The dialogue box was below the map. Similar to an instant messaging applica- In the version for the Semi-Constrained condi- tion, the user could type his/her messages and see tion, the wizard could interact with the user in the robot’s responses appearing on the lower part two ways: first, they could click on the buttons, of the box. In the first WOz study, the user’s in- situated on the upper part of the dialogue box, to terface included a small “monitor” on the upper automatically send the canned responses, “Hel-

112 lo”, “Goodbye”, “Yes”, “No”, “Ok” and the back? problem-signalling responses, “What?”, “I don’t U7 (351,68@10:46:55) instruct go forward and at the cros- understand” and “I cannot do that”. The second sroads keep going forward and way was to click on the “Robot Asks Question” the tube is at the end of the and “Robot Gives Info” buttons which allowed road R8 (351,0@10:47:14) WE explain the wizard to type his/her own responses (see Out of bounds. Figure 2 in Appendix A). R9 (351,608@10:47:47) query-w Where to go? 2.4 Procedure U10 (364,608@10:48:12) instruct the tube is in front of you A total of 32 participants were recruited, 16 users R11 (402,547@10:48:23) BOT query-yn and 16 wizards. The participants were randomly Is it this one? assigned to the studies, experimental conditions U12 (402,547@10:49:7) SUC reply-y and to the roles of wizard or user. The pairs were yes it is. Table 1. Example of an annotated dialogue. ID seated in different rooms equipped with a desk- top PC. The wizards were given a short demon- denotes the speaker (User or Robot), T.S. stands stration and a trial period to familiarise with the for task status and MISC for miscommunication. operation of the system and were also informed 3.1 Annotation of dialogue acts about whether the users would be able to monitor them. The users were told that they would inte- The DAs in the corpus were annotated following ract with a robot via a computer interface; this the HCRC coding scheme (Carletta et al., 1996). robot was very fluent in understanding spatial Motivated by Skantze (2005), the last column in language and could give appropriate responses, it Table 1 contains information on the explicitness could learn routes but had limited vision. The of the response. This feature was only relevant users were asked to begin each task whenever for repair initiations by the wizards. For instance, they felt ready by clicking on the links on their responses like “What?” and the the ones in Table computer screen, start the interaction with “Hel- 3 were considered to be explicit (EX) signals of lo”, which opened the wizard’s application, and miscommunication, whereas lines 2 and 4 in the end it with “Goodbye” which closed both appli- dialogue above were labelled as implicit (IMP). cations. The participants received verbal and 3.2 Annotation of task execution status written descriptions of the experiment. They were not given any specific guidelines on how to The coordinates (x,y) of the robot’s position re- interact or what routes to take. However, the us- corded for every exchanged message were placed ers were asked not to use directions such as on the map of the town (of dimensions 1024x600 “north”, “south”, “up”, “down” etc. The wizards pixels) allowing the analysts to retrace the were also advised not to try to “sound like a ro- movements of the robot. Wrong executions (WE) bot”, but respond normally. were determined by juxtaposing the user’s in- struction with the robot’s execution, as indicated 3 Data analysis by the coordinates. Back-on-Track (BOT) was tagged when the first user instruction after a The dialogues collected were annotated on the wrong execution was executed correctly. Finally, basis of dialogue acts (DAs), miscommunication, task success (SUC) was labelled when the robot and task status. An example of an annotated di- reached the destination and it was confirmed by alogue is shown in Table 1. the user. ID (x,y@TIME) T.S. MISC HCRC TAG MESSAGE 3.3 Annotation of miscommunication U1 (899,445@10:44:34) instruct go straight on. The annotation of instances of miscommunica- R2 (880,286@10:44:59) Rej explain Imp tion in the dialogues is based on the definitions There is a wall straight on. U3 (884,286@10:45:28) acknowledge/ given by Hirst et al. (1994). Miscommunication ok turn left here then take the instruct includes three categories of problems: misunder- third right standings, non-understandings and misconcep- R4 (352,216@10:45:58) Non check Imp Before the bridge? tions. First, misunderstandings occur when the U5 (351,68@10:46:17) WE Corr explain hearer obtains an interpretation which is not this is the wrong place aligned to what the speaker intended him/her to R6 (351,68@10:46:39) acknowledge/ obtain. In this study, without attempting to unveil Wrong place. Should Robot go query-yn the intention of the user, misunderstandings were

113 tagged when the user (who was monitoring the fication of clarification requests (CRs) and the understanding) signalled a wrong execution (see original dialogue act tagging, the DAs used by line 5 in Table 1). These correction tags (Corr) the wizards to signal non-understandings and did not always coincide with wrong execution rejections were categorised as shown in Table 2. tags, but were used when the user became aware Dialogue Act Explanation of the error (after receiving visual or verbal in- Explain The wizard gives description of robot’s location. E.g., “I crossed the bridge.”, “I am at a cross- formation). Following the same definition, mis- road”. understandings were also tagged as rejections Check This category covers CRs. The corpus contained (tag: Rej) when the wizard expressed inability to two types of CRs: first, task-level reformulations (as in line 4 in Table 1), which reformulate the execute the instruction (for instance, given the utterance on the basis of its effects on the task, robot’s current location, as shown in line 2 in the showing the wizard’s subjective understanding dialogue), although he/she was able to interpret (Gabsdil, 2003). Second, alternative CRs which it. Secondly, non-understandings (tag: Non, line occur when the wizard gives two alternative interpretations, trying to resolve referential 4) occurred when the wizards obtained no inter- ambiguity. For instance, “back to the bridge or pretation at all or too many. Non-understandings to the factory”, to resolve “go back to last loca- also included cases in which wizards were uncer- tion”. Query-w The wizard asks for further instructions. E.g., tain about their interpretation (as suggested by “Please give me further instructions.” Gabsdil, 2003). Lastly, misconceptions happen Explain+Query- A combo of actions; the wizard provides infor- when the beliefs of the interlocutors clash, and w mation on location and asks for further instruc- are outside the scope of this study. tions. E.g., “crossroads, now where?” Table 2.Wizard DAs after miscommunication. 4 First WOz study Figure 3 illustrates the distribution of these re- Skantze (2005) and Williams and Young (2004) sponses to signal non-understandings and rejec- performed variations of WOz studies to explore tions (columns labelled “Uncons-NON” and how humans handle ASR errors, using a real or “Uncons-REJ”, respectively). Evidently, there is simulated speech recogniser. They discovered a much greater variety of CRs than the two CR that even after highly inaccurate recognition out- types reported here, as described in the work of put, the participants rarely signalled non- Purver (2006) and Schlangen (2004). However, understanding explicitly. Accordingly, the expe- for a navigation task and having excluded ASR rimental hypothesis of the present study is that errors, problems occurred mainly in the meaning wizards in both conditions will not choose expli- recognition level (explained below) and aimed cit responses to signal miscommunication (such for reference resolution. as “I don’t understand” or “What?”) but res- ponses that contribute with information. ASR is a major source of errors in SDS. But as miscommunication is ubiquitous in interaction, there are many other sources of ambiguity that give rise to problematic understanding. Thus, for the current purposes of this work, it was decided that ASR would have an overwhelming effect on the interaction that might prevent the observation of other interesting dialogue phenomena. This section describes further work on the Un- Figure 3. Use of strategies to signal non- constrained and Semi-Constrained conditions understandings or rejections, for either condition. (see Section 2.1). Twenty participants were re- In conclusion, wizards in the Unconstrained cruited and randomly allocated to each condition. condition did not directly signal problems in un- 4.1 Results derstanding but, instead, they attempted to ad- vance the dialogue by providing task-related in- Analysis of the dialogues of the Unconstrained formation in either the form of CRs or simple condition reinforced previous findings and con- statements. The study contributes to the findings firmed the experimental hypothesis. In particular, presented in Skantze (2005) and Williams and wizards never used explicit repairs, but preferred Young (2004) in that it demonstrates the use of to describe their location, request clarification similar strategies to deal with different sources of and further instructions. Integrating finer classi- problems.

114 In the Semi-Constrained condition, a degree of restrain and control over the error handling ca- pacity of the wizards was introduced. In particu- lar, the wizards could explicitly signal communi- cation problems in the utterance, meaning and action level using three predefined responses. This is inspired by the models of Clark (1996) and Allwood (1995), according to which, mis- communication can occur in any of these levels and people select repair initiations that point to the source of the problem. The model (adapted Figure 4. Occurrence of implicit and explicit from Mills and Healey, 2006) and the responses miscommunication signals (Semi-Constrained). are schematically shown in Table 3 below. Levels of Communication Wizard Responses 4.2 Discussion and future work Level 1 Securing Attention - Level 2 Utterance Recognition “What?” The findings of this study could be extrapolated Level 3 Meaning Recognition “Sorry, I don’t understand.” to HRI. Classification of the responses of the Level 4 Action Recognition “I cannot do that.” wizards resulted in a limited set of error signal- Table 3. Levels of communication. ling strategies. Therefore, in the presence of mis- Moreover, based on the classification of the communication the robot could use the static, explicit strategies. But these strategies alone are wizard’s error handling strategies in the Uncon- strained condition (Table 2), we collapsed the inadequate (as shown by Koulouri and Lauria, observed strategies in two categories of re- 2009). They need to be supplemented, but not entirely replaced, with dynamic error handling sponses which resulted in adding two more error handling buttons; namely, the button denoted as strategies; namely, posing relevant questions and providing descriptions of location. Yet this en- “Robot Asks Question” corresponded to the “Check” and “Query-w” strategies. The “Robot tails several challenges. Gabsdil (2003) identifies the complexity of adding clarification requests to Gives Info” was associated with “Explain”. This systems with deep semantic processing. With clear labelling of error handling actions pre- sented to the wizards of the Semi-Constrained regard to alternative clarifications, systems would need to generate two alternative interpre- condition aimed to “coerce” them to use the strategies in a more transparent way. This could tations for one referent. Task-level reformula- tions would also require the system to have the allow us a glimpse to the mechanisms and proc- esses underlying human miscommunication capability to identify the effects of all possible management. executions of the instruction. As a next step, we will focus on issues concerning the implementa- Analysis of the dialogues revealed that in the Semi-Constrained condition wizards employed tion of such functionality. Schlangen (2004) suggests that “general- both explicit and implicit strategies. Figure 4 shows the distribution of explicit and implicit purpose” repair initiations, such as “What?”, responses to signal non-understandings and re- which request repetition of the whole utterance, are more severe for the dialogue compared to jections. Figure 3 shows the frequency of each implicit strategy to signal non-understandings reprise fragments (e.g., “Turn where?”) that ac- cept part of the utterance. Mills and Healey (Semi-NON) and rejections (Semi-REJ). The initial prediction was that wizards will not (2006) also found that “What’s” were more dis- use explicit signals of problems in the dialogue. ruptive to the dialogue than reprise fragments. Guided by these insights, our current work looks This was contradicted by the results. It can be argued that the physical presence of the buttons at how each error strategy affects the subsequent unfolding of the dialogue. and the less effort required account for this phe- nomenon. On the other hand, it is also plausible 5 The second WOz study to assume that these strategies matched what the wizards wanted to say. Finally, there were no Research in human communication has shown significant differences between conditions in that in task-oriented interactions visual informa- terms of user experience, task success and time tion has a great impact on dialogue patterns and on task (as reported in Koulouri and Lauria, improves performance in the task. In particular, 2009). Gergle at al. (2004), Clark and Krych (2004) and

115 Brennan (2005) explored different communica- dition could not compensate for the lack of visual tion tasks and compared a condition, in which information and took longer for each task. visual and verbal information was available, with Number of turns and words: The aforemen- a speech-only condition. In their experiments, a tioned studies correlate task efficiency with person gave instructions to another participant on number of turns and words. In terms of the mean how to complete a task. Their findings seem to number of turns per interaction, no significant resonate. In terms of time for task completion differences were found across the groups. Nev- and number of words per turn, the interactions in ertheless, we measured the number of words the visual information condition were more effi- used per task and in accordance with previous cient. The physical actions of the person follow- research, we observed that pairs in the No Moni- ing the instructions functioned as confirmations tor conditions used more words (F=4.602, and substituted for verbal grounding. Regarding df=1,11, p=0.05). However, it was the wizards errors, no significant differences were observed under the No Monitor conditions that had to be between visual and speech-only conditions. Mo- more “talkative” and descriptive (F=10.324, tivated by these findings in human-human inte- df=1,11, p<0.01). Figure 5 shows the “word- raction, the second study aims to identify the dif- possession” rates attributed to wizards in the four ferences in the processes of communication de- conditions. Moreover, there seems to be a differ- pending on whether the user can or cannot moni- ence (F=4.397, df=1,11, p=0.05) in the mean tor the actions of the robot. number of words per turn. In particular, when the wizards’ actions were visible to the users, the 5.1 Experimental design wizards required fewer words per turn. There is The study followed a between-subjects factorial also an interaction effect showing more signifi- design. Experiments were performed for four cant differences between the Monitor, Semi- different conditions, as illustrated in Table 4. The Constrained condition and the No Monitor, conditions “Monitor, Unconstrained” and “Moni- Semi-Constrained condition (F=5.970, df=1,11, tor, Semi-Constrained” were the same as in the p<0.05); in the former, wizards managed with first study. Five pairs of participants were re- less than 2 words per utterance, taking full ad- cruited to each of the Monitor Conditions and vantage of the luxury of the buttons and the fact three pairs to each of the No Monitor Conditions. that they were supervised. In the latter, wizards Unconstrained Semi-Constrained used more than 6 words per turn. Monitor, Uncon- Monitor, Semi- Monitor strained Constrained No Monitor, Uncon- No Monitor, Semi- No Monitor strained Constrained Table 4. The design of the 2nd study. 5.2 Results The data collection resulted in 96 dialogues, 93 of which were used in the analysis. The data Figure 5. Words used by wizards over total. were analysed using a two-way ANOVA. All effects that were found to be significant were Frequency of miscommunication: We meas- verified by T-tests. The efficiency of interaction ured the number of turns that were tagged as con- was determined using the following measures: taining miscommunication. Surprisingly, mis- time per task, number of turns, words, miscom- communication rates were much lower in the No munication-tagged turns, wrong executions and Monitor conditions (F=13.316, df=1,11, p<0.01) task success. and not in the conditions in which the user could Time per task: The second column of Table 5 check at all times the actions and understanding displays the average completion time per task in of the robot. The same pattern was found for us- the four conditions. As expected, a main effect of er-initiated and robot-initiated miscommunica- the Monitor factor was found (F=4.879, df=1,11, tion. The rates of miscommunication are in- p<0.05). Namely, when the user could monitor cluded in the third column of Table 5. the robot’s area the routes were completed faster. Wrong executions: Analysis of number of The interaction effect between factors was also wrong executions per task reveals a similar ef- marginally significant (F=4.225, df=1,11, p<0.1); fect; wrong executions occurred much less fre- pairs in the No Monitor, Semi-Constrained con- quently when the wizards were not supervised by

116 the users (F=6.046, df=1,11, p<0.05). They tion with rich verbal descriptions of their loca- made on average 1 mistake per task, whereas the tions. average number of wrong executions for the We are currently performing more experi- pairs in the Monitor conditions was 5 (fourth ments to balance the data sets of the study and column in Table 5). validate the initial results. Moreover, a fine- Task success rates: There were no differences grained analysis of the dialogues is under way in the number of interrupted or aborted tasks. and focuses on the linguistic content of the inte- Time per #Wrong Miscommunication ractions. The aim is identical to the first WOz Condition Task Executions Turns/Total Turns study, that is, to identify the strategies of the wi- (min) per Task Mon, Uncons 4.57 8.21% 4.2 zards in the presence and absence of visual in- Mon, Semi 4.63 8.82% 5.8 formation. No Mon, Uncons 5.67 2.55% 1.0 These results have important implications for No Mon, Semi 7.41 1.71% 0.7 HRI. As in human collaborative interaction, the Table 5. Summary of results (mean values). robot’s communicative actions have direct im- 5.3 Discussion and future work pact on the actions of the users. In real-world settings, there will be situations in which the us- These results are consistent with previous re- ers cannot monitor the robot’s activities or their search. The conditions in which the user could information and knowledge are either con- see exactly what the robot saw and did resulted strained or outdated. Robots that can dynamically in faster task completion and shorter dialogues. determine and provide appropriate feedback However, a finding emerged which was not ex- could help the users avoid serious errors. Never- pected based on the aforementioned studies: in theless, this is not a straightforward process; pro- the conditions in which users could not monitor viding excessive, untimely feedback compromis- the robot’s actions, the wizards were more accu- es the “naturalness” and efficiency of the interac- rate, leading to low occurrence of wrong execu- tion. The amount and placement of feedback tions and miscommunication (see column 3 and 4 should be decided upon several knowledge in Table 5). The “least collaborative effort” is sources, combined in a single criterion that is balanced and compromised against the need to adaptive within and between interactions. These ensure understanding. Thus, wizards provided issues are the object of our future work and im- rich and timely feedback to the users in order to plementation. compensate for the lack of visual information. This feedback acted in a proactive way and pre- 6 Concluding remarks vented miscommunication and wrong executions. In the Monitor conditions, asymmetries in per- One of the most valuable but complex processes ceived responsibility and knowledge between the in the design of a NLI for a robot is enacting a participants could have encouraged wizards to be HRI scenario to obtain naturally-occurring data less cautious to act. In other words, as the user which is yet generalisable and relevant for the had access to the full map and the location of the future implementation of the system. The present wizard, the wizard felt less “obliged” to contri- study recreated a navigation scenario in which bute to the interaction. However, due to the com- non-experienced users interacted with and taught plex nature of the task, unless the wizard could a mobile robot. It also simulated two different sufficiently communicate the relevant position of setups which corresponded to the realistic situa- the robot, the directions of the user would more tions of supervised and unsupervised interaction. likely be incorrect. It could also be assumed that The current trend in the fields of linguistics and since visual feedback is instant, the users were robotics is the unified investigation of spatial also more inclined to issue commands in a “trial language and dialogue (Coventry et al., 2009). and error” process. Irrespectively to the underly- Exploring dialogue-based navigation of a robot, ing motives, these findings show that despite our study aimed to contribute to this body of re- higher costs in time and word count, linguistic search. It can be argued that there were limita- resources were adequate for completing complex tions in the simulation as compared to the expe- tasks successfully. The findings also resonate rimental testing of a real system and, thus, the with the collaborative view of communication. study was primarily explorative. However, it The wizards adapted their behaviour in response yielded natural dialogues given that naive “con- to variations in the knowledge state of their part- federates” and no dialogue script were used. The ners and made up for the lack of visual informa- data analysis was more qualitative than quantita-

117 tive and followed established methods from pre- Gregory Mills and Patrick G. T. Healey. 2008. Nego- vious research. Finally, the results of the study tiation in Dialogue: Mechanisms of Alignment. In matched and extended these findings and pro- Proceedings of the 8th SIGdial workshop on Dis- vided useful information for the next version of course and Dialogue, Columbus, OH, USA. the system as well as some insight into the Gregory Mills and Patrick G. T. Healey. 2006. Clari- processes of conversation and social psychology. fying Spatial Descriptions: Local and Global Ef- The next step in our research is to develop the fects on Semantic Co-ordination. In Procs. of the th dialogue manager of the robot to incorporate the 10 Workshop on the Semantics and of feedback and miscommunication management Dialogue. strategies, as observed in the collected data. This Herbert H. Clark. 1996. Using Language. Cambridge holds the promise for a robust NLI that can han- University Press, Cambridge, UK. dle uncertainties arising from language and the Herbert H. Clark and Meredyth A. Krych. 2004. environment. However, miscommunication in Speaking While Monitoring Addressees for Under- HRI reaches beyond preventing and repairing standing. Journal of Memory and Language, recognition errors. Mills and Healey (2008) 50:62-81. demonstrate that miscommunication does not Jason D. Williams and Steve Young. 2004. Characte- inhibit but, on the contrary, it facilitates semantic rizing Task-Oriented Dialog Using a Simulated coordination. Martinovsky and Traum (2003) ASR Channel. ICSLP. Jeju, South Korea. suggest that through miscommunication, people gain awareness of the state and capabilities of Jean Carletta, Amy Isard, Stephen Isard, Jacqueline each other. Miscommunication, thus, is seen as Kowtko, Gwyneth Doherty-Sneddon and Anne H. Anderson.1996. HCRC Dialogue Structure Coding an opportunity for communication. Under this Manual(HCRC/TR-82). Human Communication light, natural miscommunication management is Research Centre, University of Edinburgh. not only the end, but also the means to shape and advance HRI, so that robots are not tools but Jens Allwood. 1995. An Activity based Approach to partners that play a positive, practical and long- Pragmatics. Gothenburg Papers in Theoretical Lin- guistics, 76, Göteborg University, Sweden. lasting role in human life. Kenny Coventry, Thora Tenbrink and John Bateman, References 2009. Spatial Language and Dialogue: Navigating the Domain. In K. Coventry, T. Tenbrink, and J. Bilyana Martinovsky and David Traum. 2003.The Bateman (Eds.) Spatial Language and Dialogue. 1- Error Is the Clue: Breakdown in Human-Machine 8. Oxford University Press. Oxford, UK. Interaction. In Proceedings of the ISCA Workshop on Error Handling in Dialogue Systems. Malte Gabsdil. 2003. Clarification in Spoken Dialo- gue Systems. In: Proceedings of 2003 AAAI Spring Dan Bohus and Alexander I. Rudnicky. 2005. Sorry, I Symposium on Natural Language Generation in Didn’t Catch That! – An Investigation of Non- Spoken and Written Dialogue, Stanford, USA. understanding Errors and Recovery Strategies. In Proceedings of SIGdial2005. Lisbon, Portugal. Matthew Purver. 2006. CLARIE: Handling Clarifica- tion Requests in a Dialogue System. Research on Darren Gergle, Robert E. Kraut and Susan E. Fussell. Language and Computation . 4(2-3):259-288. 2004. Language Efficiency and Visual Technolo- gy: Minimizing Collaborative Effort with Visual Stanislao Lauria, Guido Bugmann, Theocharis Kyria- Information. Journal of Language and Social Psy- cou, Johan Bos and Ewan Klein. 2001. Training chology, 23(4):491-517. Sage Publications, CA. Personal Robots Using Natural Language Instruc- tion. IEEE Intelligent Systems. 38–45. David Schlangen. 2004. Causes and Strategies for Requesting Clarification in dialogue. In Proceed- Susan E. Brennan . 2005. How Conversation is ings of the 5th Workshop of the ACL SIG on Dis- Shaped by Visual and Spoken Evidence. In J. course and Dialogue (SIGdial04), Boston, USA. Trueswell & M. Tanenhaus (Eds.) Approaches to Studying World-situated Language Use: Bridging Gabriel Skantze. 2005. Exploring Human Error Re- the Language-as-product and Language-action covery Strategies: Implications for Spoken Dialo- Traditions. 95-129. MIT Press, Cambridge, MA. gue Systems. Speech Communication, 45(3):207- 359. Theodora Koulouri and Stanislao Lauria. 2009. A WOz Framework for Exploring Miscommunication Graeme Hirst, Susan McRoy, Peter Heeman, Philip in HRI, In Procs. of the AISB Symposium on New Edmonds, Diane Horton.1994. Repairing Conver- Frontiers in Human-Robot Interaction. Edinburgh, sational Misunderstandings and Nonunderstand- UK. ings. Speech Communication 15:213–230.

118 Appendix A. Screenshot images of the in- terface

Figure 1. The interface of the user without the monitor (as used in the second WOz study).

Figure 2. The interface of the wizard in the Semi- Constrained condition.

119