Seventh Large Open Pit Mining Conference 2010

The Development of a Novel Eye Tracking Based Remote Camera Control for Mining Teleoperation

D Zhu, T Gedeon and K Taylor Reference Number: 44

Contact Author: (Use “Author Details” style) Full name: Dingyun Zhu Position title: PhD Candidate Organisation Name: CSIRO ICT Centre & School of Computer Science, ANU Address: CS&IT Building, Australian National University, North Road Acton ACT 0200, Canberra Phone: +61 2 6216 7141 Fax: +61 2 6216 7111 Mobile: 0420905362 Email: [email protected]

1

The Development of a Novel Eye Tracking Based Remote Camera Control for Mining Teleoperation

D Zhu 1, T Gedeon 2 and K Taylor 3

1. Dingyun Zhu PhD Candidate, CSIRO ICT Centre & School of Computer Science, ANU, CS&IT Building, Australian National University, North Road Acton ACT 0200, Canberra. [email protected]

2. Tom Gedeon Professor, School of Computer Science, ANU, CS&IT Building, Australian National University, North Road Acton ACT 0200, Canberra. [email protected]

3. Ken Taylor Research Scientist, CSIRO ICT Centre, CS&IT Building, Australian National University, North Road Acton ACT 0200, Canberra. [email protected]

2

ABSTRACT The use of teleoperation in mining is increasing due to requirements to improve safety and reduce the number of people required to work in remote and difficult environments. CSIRO’s recent collaborative project with was to develop a teleoperated control system to the primary rock breaker at the West Angelas mine in Western , with the objective of demonstrating the feasibility of remote rock breaking over long distances. This required controlling both the rock breaker and an array of pan-tilt-zoom (PTZ) cameras using joysticks. Controlling the cameras distracted operators from the primary task of controlling the rock breaker to position and fire the hammer. This lengthened task times as both the operator’s attention and their hands have to swap between different control interfaces.

In this paper, we present a novel design using human eye gaze as an interactive input for the remote camera control. It follows a simple and natural design principle: “Whatever you look at the screen, it goes to the centre. ” A prototype system has been implemented by integrating computer vision based eye tracking technology, with the advantage of being real-time, robust, and marker-less.

We conducted a user evaluation, where users undertook a control task while obscuring the direct view of the working area and using a remote camera to transfer the live video stream as the visual feedback. Subjects were required to participate in the experiment by using eye tracking camera control and traditional joystick control respectively. The statistical analysis of objective measures indicated that our novel eye tracking control significantly outperformed the joystick control. The results of the post-experimental subjective measures also revealed much higher user preference for using eye tracking control, which provides clear evidence that this design is a better interface for remote camera control in mining teleoperation settings.

Keywords: mining teleoperation, eye tracking, remote camera control, rock breaking, user interface, usability evaluation

INTRODUCTION Safety is always the most important issue in any industry and the mining industry is no different. In order to prevent people from working under hazardous conditions or in difficult environments, teleoperation has been increasingly used as an effective solution in mining industries. Compared to the common line-of-sight control on the mine sites, such a solution allows people to remotely control or manipulate complex mining machinery over long distances, with the merits of providing replaceable surrogates, reducing number of staff and being cost-effective. Although there has been a continuous long-term effort spent on researching and developing autonomous or semi-autonomous systems for a variety of mining tasks, human observation, intervention, and supervision are still integrally involved in these systems (Hughes and Lewis, 2005).

The benefits of using teleoperation are already apparent in numerous areas, ranging from space exploration, inspection, robotic navigation, surveillance, underwater operations and rescue activities, etc. Due to the fast improving inexpensive broadband networks, users are able to be situated thousands of kilometers away to accomplish control tasks, allowing them to work under a more convenient condition or to use their skills at many different sites around the world. Particular examples like: web-based robot control (Taylor et al , 1999),

3 internet-based remote control for mining machinery (Kizil and Hancock, 2008), remote surgery and simulated collaborative surgical training using haptic devices (Gunn et al , 2005), etc.

Within the teleoperation model, the user interface is always the most fundamental and essential component that affects the entire teleoperation performance significantly. In fact, observations still indicate that directly controlling a robot while watching a video feed from the remote camera(s) remains the most common interaction form in teleoperation (Fong and Thorpe, 2001). Therefore, the basic perceptual link between the user end and the remote environment is generally through a live video stream from a remote camera or a set of cameras as the foundational situational awareness for the operator.

The requirements for user interfaces for teleoperation of mining vehicles and systems has been briefly discussed in (Hainsworth, 2001), with the demonstrations of two teleoperated mining systems. In this paper, it clearly indicates that conventional user interfaces such as joysticks, switches, wheels, mice and keyboards are still the major control elements used in mining teleoperation. They are relatively simple, sophisticated, allowing teleoperation to be a viable and profitable technique, which satisfy the basic client requirements for mining systems of robustness and reliability. However, in most teleoperation settings, the operator often has multiple devices to control simultaneously, for example, controlling a robot and a remote camera at the same time. Using conventional interfaces will lead to frequent attention and hands switches between different control tasks and interfaces. It distracts the operator from concentrating on the primary control task, reduces the productivity of the entire process, and increases both workload and the number of avoidable operational mistakes.

This paper particularly addresses this attention and hands switch problem in a situation where an operator is controlling one or more remote cameras meanwhile carrying out other mining teleoperation tasks. Instead of using conventional control interfaces to overload the operator’s control ability, we present a novel design that uses human eye gaze as an alternative input for the remote camera control using computer vision based eye-tracking technology. With the user evaluation of a modeled experiment for an implemented prototype system, experimental data was gathered through both objective (performance) measures and subjective (user preference) measures. By the further result analysis, we demonstrate the effectiveness of using the eye-tracking remote camera control for resolving this common problem in mining teleoperation.

PROJECT BACKGROUND AND RELATED WORK

Remote rock breaking in mining teleoperation

The telerobotic rock breaker (Duff et al , 2009) was a recent collaborative project between CSIRO and Rio Tinto Iron Ore. CSIRO was contracted to develop a remote control system to the primary rock breaker at the West Angelas mine (see Figure 1), situated over 1000km north-east of Perth in . The rock breaker on the mine site is a serial link manipulator arm with a large hydraulic hammer at the tip to break oversized rocks. The arm is installed at a ROM (Run of Mine) bin, where a number of horizontal bars (referred to as a grizzly) are fitted at the bottom in order to prevent oversized rocks from entering the crusher below. Figure 2 shows the overview of the rock breaker (left) and the grizzly in the ROM bin (right).

4

Fig 1 – West Angelas iron ore mine in Western Australia.

Fig 2 – Overview of the rock breaker (left), ROM bin with a grizzly at the bottom (right).

On the remote mine site, a number of haul trucks with ore from a nearby quarry will be delivering their load into the bin (see Figure 3). The operator is required to break those oversized rocks stuck on the grizzly by operating a two-handed joystick controller. The operator has limited time to break the rocks, as trucks arrive at short intervals (about 90 seconds). Since dumping a load raises a large could of dust, a water spray is used to settle the dust, which requires about 30 seconds making the operator have a clear vision of the bin.

5

Therefore, the operator only has about 60 seconds to move the arm from its rest position, place it carefully onto a rock, break it by firing the jackhammer, and return the arm to the rest before next truck arrives. The actual remote rock breaking process is shown in Figure 4, which allows the operator to have a desktop based teleoperation environment and live videos as the visual feedback.

Fig 3 – Haul truck dumping a load into the ROM bin.

When the operator is trying to break a rock, it is necessary for them to have a close view of the target so that detailed information can be obtained to specify the spot on the target for positioning the tip and firing the hammer. It is practically impossible to mount the remote camera on the arm to couple the camera motion to the control of the remote robot like most telerobotic or vehicle settings for reducing the control complexity, as the camera would be easily damaged when the hammer on the tip is being fired to break a rock. The remote camera is actually installed on the side of the bin with a zoomed-in view transferring the live video back to the operator. The operator has to use another joystick controller to control the camera motion for adjusting the view of the target rock in order to complete the breaking spot inspection process then move on to the rock breaker arm control. This turns out to be a typical multi-control problem that requires operators to switch hands and attention quite often between different control interfaces and tasks.

Fig 4 – Multi-control situation in remote rock breaking process.

6

In fact, a tip-tracking approach has been introduced in (Duff et al , 2009) as a possible solution for this problem, which makes the remote camera always follow the tip of the hammer by processing the position data from the sensor devices installed on the rock breaker arm and the corresponding locations around the bin. However, due to the unavoidable noise from the sensors working in the harsh mining environment, the camera motion cannot precisely track the tip, especially when moving the arm from the rest position or returning it. This affects the operator to acquire insufficient visual feedback from the video stream as the remote camera may be inaccurately pointing at the spot that they expect to view.

Remote camera control in teleoperation Apart from those conventional remote camera control methods, several alternative approaches or designs have been developed or discussed. For example, (Cohen et al , 1996) proposed the possibility of using a set of circular oscillatory hand gestures to control a remote camera’s pan and tilt motion. In addition, another Pan-Tilt-Zoom (PTZ) camera control system using a Wii remote and a set of infrared sensors has also been described in (Goh et al , 2008), as the wide popularity of the Nintendo Wii and its advantages of being low- cost and easy to use. These approaches are able to provide more interactive or natural ways for the traditional remote camera control, but they still require users to pay attention and use hands to operate.

For the early stage of this project, we explored the possibility of using head tracking as another alternative input for the remote camera control in order to solve the hands-busy problem in teleoperation (Zhu et al , 2009). Two different types of head tracking control techniques have been described and evaluated, one is using a user’s continuous head movements, called ”head motion control”; the other one is called ”head flicking control” which is mostly like a switch. It uses a user’s quick head movements to control the camera motion (e.g. if the user quickly rotates the head to either left or right direction then moves back to the original position, which appropriately turns on the camera to start panning along this direction and flick to the opposite direction to make the camera stop.). We further extended the work by importing more natural interactive forms into the control interface design for teleoperation, a few user studies have also been conducted with the results reported in (Zhu et al , 2010).

Eye tracking in user interface Eye tracking has been served as an augmented input medium or control modality in user interface for long time. Clearly, it has a number of compelling reasons, advantages and motivations on why eye tracking can bring new concepts and improvements to the user interface design (Zhai et al , 1999):

1. It can be an effective solution for situations that prohibit the use of the hands, for example, when the user’s hands are disabled (quadriplegic) or continuously occupied with other tasks (such as the hands-busy problem in the rock breaking task).

2. Increasing the speed of user input, as clearly the eye can move more quickly in comparison to other input mediums.

3. Reducing workload, repetitive stress, fatigue (nearly fatigue-free interaction (Saito, 1992)) and potential injury caused by physically operating other devices.

7

Numerous approaches, techniques, applications and systems using eye tracking have been proposed and developed for various situations, ranging from traditional pointing, virtual environment interaction, real-world device control, etc. Back to the late 90’s, (Yanco, 1998) introduced a prototype robotic wheelchair system with an eye tracking control interface. Recently, (Tall et al, 2009) constructed another experimental robotic vehicle which could be remotely driven by a gaze-controlled interface. Furthermore, eye tracking has also been used as an alternative type of user intention for virtual gaming (Gedeon et al , 2008), leading a group of robots to accomplish cooperative tasks (Zhu et al , 2010).

SYSTEM DESIGN AND IMPLEMENTATION Our design of using eye tracking for camera control is based on the gaze coordinates on the screen. Figure 5 shows details of the design. We can break down the entire approach into the following three steps:

1. Processing the raw gaze data to filter noisy points, recognize gaze fixation by calculating the centroid of grouped non-noisy points.

2. Calculating the distance and the angle between the current fixation position and the centre of the screen.

3. If the fixation is in the central area (distance < the radius of the central area), the camera will remain at the current position. On the other hand, if the fixation is out of the central area, the camera will start moving with the angle calculated between the current fixation and the centre of the screen.

Fig 5 – Eye tracking for camera control.

Since human raw gaze points are quite noisy, and are not suitable for direct application (Jacob, 1991), we used a modified version of the Velocity-Threshold Identification fixation detection algorithm for filtering the raw gaze points into fixations, as this method is straightforward to implement, runs very efficiently, and can easily run in real time (Salvucci and Goldberg, 2000). The camera motion will keep following the user’s current fixation direction, if its position is not in the centre area of the screen. The overview of the entire process is that wherever the user focuses their visual attention in the video stream, the camera will always

8 bring that to the centre of the screen . In this process we believe the user will not feel that they are actually performing “deliberate control ” of the camera movements.

The prototype system was implemented in two major parts: the user end and the remote camera site, in between can be a standard network connection. The overall system architecture is illustrated in Figure 6.

Fig 6 – System architecture for the eye tracking remote camera control prototype.

At the user end, we integrated the FaceLAB ® 4.5 eye tracking system (laptop-based version) into our prototype, which provides the real-time gaze tracking at a 60Hz frequency without the use of markers. This avoids the need to make the user wear any specialized devices, offering comfort and flexibility. The eye tracker was connected to a main PC through a local network for transferring the real-time raw gaze data. The FaceLAB ® Client Tools SDK was installed on the main PC, called by the eye tracking camera control code for receiving the raw data from the local network. The control code translates the raw gaze data into corresponding camera control commands and sends them to the remote camera through the external network. The laptop-based eye tracker shared the user screen for eye tracking on the main PC, as the user would only be seeing the video stream from the remote camera on the user screen. The eye tracking camera control code and other relevant software integrations were all implemented in Visual C++.

On the other site, we used the Pelco ® ES30C (the same mode of camera has been used in the real rock breaking setting) as the remote camera to be controlled in the prototype system with the capability to perform pan and tilt functions simultaneously (maximum pan speed = 100_/s, maximum tilt speed = 30_/s). It was connected to the user end through an external network, transferring the live video stream back to the user and also receiving the control commands to carry out the relevant camera movements.

USER EVALUATION A lab-based user study was conducted to evaluate how well the eye tracking camera control could perform in a model of a real-world hands-busy setting, in comparison to a conventional joystick control. Considered the fact that we had limited access to the real rock breaker equipment, we modeled the original setting by a

9 functional physical model (Gedeon and Zhu, 2010), using a physical game analogue: a redesigned foosball game with two handles (see Figure 7).

Fig 7 – Redesigned foosball table for the user experiment

We recruited university students as experimental subjects. The reason we chose to construct our functional physical model primarily on such a game based task was because of the competitiveness and engagement we observed among the real operators performing operations in the original setting. We believed that our new design could be an appropriate model for the original rock breaking setting with the advantage of being more compelling and interesting to our student based subjects than a boring and industrial-like control task.

Fig 8 - Experimental setting: (1) participant, (2) remote camera, (3) screen view of video stream, (4) eye tracker, (5) gamepad as joystick control, (6) re-designed foosball table, (7) covers to obscure participant’s direct view of the foosball table.

10

Figure 8 shows the experimental setting of the evaluation. For each camera control method (eye tracking and joystick), participants had 5 minutes to play the foosball game. No pre-training time was offered to get used to either of the control method. Both goals and kicks were recorded as each participant’s objective performance, and following with a questionnaire to collect participants’ subjective preference, regarding naturalness, required consciousness, distraction and time to get used to each camera control.

EXPERIMENTAL RESULTS The major objective measures are according to the analysis of the number of goals and the number of kicks each participant achieved in the corresponding camera control trail. The statistical analyses on the collected data clearly tell us that the eye tracking control performed significantly better results than the joystick control (see Figure 9).

Fig 9 – Objective performance comparison: Goals and Kicks.

User feedbacks according to the questionnaire data remain consistent with the objective results. The eye tracking control significantly outperformed the joystick control through all the criteria we selected. In addition, most of the participants directly commented that the eye tracking control was quite effective for resolving the hands and attention switch problem in the experiment. Compared to the joystick control, it was more convenient and flexible, required less physical movements and consciousness, which resulted in saving time and effort for switching hands, obtaining significantly more opportunities to kick the ball and score more goals.

CONCLUSIONS AND FUTURE DIRECTIONS The work presented addresses a user interface problem existed in our remote rock breaking project, which requires operators to frequently switch hands and attention between different mining control interfaces. We

11 introduce a novel design of using eye tracking for remote camera control as an effective approach for this particular problem with a developed prototype system. With a lab-based user evaluation on the prototype system comparing the eye tracking control and joystick control, we collected experimental data using a functional physical model through both objective measures and subjective measures. The results of the statistical analyses demonstrate apparent outperformance of using the eye tracking control and the user preference and comments also indicate that the eye tracking control could solve the switching problem effectively.

The immediate next step work could be running more formal tests on the real rock breaker setting. Additionally, the main point of interest for future directions would be exploring more human natural interaction based design prototypes, or multi-model designs by combining gaze and other types of interaction techniques particularly for mining teleoperation, and also the investigation of improved user evaluation.

ACKNOWLEDGEMENTS The authors would like to express their appreciations to all the volunteers that participated in the user study, also thank Chris Gunn, Matt Adcock and Bodhi Philpot from the Immersive Environments Team, ICT Centre, CSIRO for their valuable suggestions and helps on the design and implementation of the system.

This work was supported by the “Future Mining” theme under CSIRO National Minerals Down Under (MDU) Research Flagship. Thanks for the support from Rio Tinto and School of Computer Science, the Australian National University.

REFERENCES Cohen, J C, Conway, L and Kiditscheck, D, 1996. Dynamical system representation, generation, and recognition of basic oscillatory motion gestures, in Proceedings Second International Conference on Face Gesture Recognition, pp 60-65.

Duff, E, Caris, C, Bonchis, A, Taylor, K, Gunn, C and Adcock, M, 2009. The development of a telerobotic rock breaker, in Proceedings Seventh International Conference on Field and Service Robots (FSR 2009), Boston, USA, pp 1-10.

Fong, T, and Thorpe, C, 2001. Vehicle teleoperation interfaces, Autonomous Robots , 11 (1): 9–18.

Gedeon, T D and Zhu, D, 2010. Developing a natural interface for a complex task using a physical model, in Proceedings Second IEEE International Conference on Intelligent Human Computer Interaction (IHCI 2010), Allahabad, India.

Gedeon, T D, Zhu, D and Mendis, B S U, 2008. Eye gaze assistance for a game-like interactive task, International Journal of Computer Games Technology, 2008(623725): 1-10.

Goh, A H W, Yong, Y S, Chan, C H, Then, S J, Chu, P L, Chau, S W and Hon, H W, 2008. Interactive ptz camera control system using wii remote and infrared sensor bar, in Proceedings World Academy of Science, Engineering and Technology, pp 127-132.

12

Gunn, C, Hutchins, M, Stevenson, D, Adcok, M and Youngblood, P, 2005. Using collaborative haptics in remote surgical training. in Proceedings First Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environments and Teleoperator Systems (WorldHaptics 2005), Pisa, Italy, pp 481-482.

Hainsworth, D W, 2001. Teleoperation user interfaces for mining robotics, Autonomous Robots , 11 (1): 19-28.

Hughes, S and Lewis, M, 2005. Task-driven camera operations for robotic exploration, IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and Humans, 35(4): 513-522.

Jacob, R J K, 1991. The use of eye movements in human-computer interaction techniques: what you look at is what you get, ACM Transactions on Information Systems, 9(3): 152-169.

Kizil, M S and Hancock, W R, 2008. Internet-based remote machinery control, in Proceedings First International Future Mining Conference and Exhibition 2008 , pp 151-154 (The Australasian Institute of Mining and Metallurgy: Melbourne).

Saito, S, 1992. Does fatigue exist in quantitative measurement of eye movements?, Ergonomics, 35(5/6): 607-615.

Salvucci, D D and Goldberg, J H, 2000. Identifying fixations and saccades in eye tracking protocols, in Proceedings 2000 Symposium on Eye Tracking Research and Applications (ETRA 2000), pp 71-78.

Tall, M, Alapetite, A, Agustin, J S, Skovsgaard, H H, Hansen, J P, Hansen, D W and Mollenbach, E, 2009. Gaze-controlled driving, in Proceedings Twenty-Seventh ACM SIGCHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI 2010), Boston, USA, pp 4387-4392.

Taylor, K, Dalton, B and Trevelyan, J, 1999. Web based telerobotics, Robotica , 17(1): 49-57.

Yanco, H, 1998. Wheelesley, a robotic wheelchair system: indoor navigation and user interface, Assistive Technology and Artificial Intelligence, Applications in Robotics, User Interfaces and Natural Language Processing, pp 256-268.

Zhai, S, Morimoto, C and Ihde, S, 1999. Manual and gaze input cascaded (magic) pointing, in Proceedings Seventeenth ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 1999), pp 246- 253.

Zhu, D, Gedeon, T and Mendis, B S U, 2010. Fuzzy methods and eye gaze for cooperative robot communication, International Journal of Intelligent Information and Database Systems, 4(1): 43-59.

Zhu, D, Gedeon, T and Taylor, K, 2009. Keyboard before head tracking depresses user success in remote camera control, in Proceedings Twelfth IFIP TC13 Conference on Human-Computer Interaction (INTERACT 2009) , Uppsala, Sweden, Lecture Notes in Computer Science, Vol. 5727, pp 319-331.

Zhu, D, Gedeon, T and Taylor, K, 2010. Natural interaction enhanced remote camera control for teleoperation, in Proceedings Twenty-Eighth ACM SIGCHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI 2010), Atlanta, USA.

13