<<

Kunstliche¨ Intelligenz 2:24–29, 2003

Learning to Recognise Objects and Situations to Control a Robot End-Effector

Gunther Heidemann and Helge Ritter

View based representations have become very popular for recognition tasks. In this contribution, we argue that the potential of the approach is not yet fully tapped: Tasks need not to be “homogeneous”, i.e. there is no need to restrict a system e.g. to either “object classification” or “gesture recognition”. Instead, qualitatively different problems like gesture recognition and scene evaluation can be handled simultaneously by the same system. This feature makes the view based approach a well suited tool for as will be demonstrated for the domain of an end-effector camera. In the described scenario, the task is threefold: Recognition of object types, judging the stability of grasps on objects and hand gesture classification. As this task leads to a large variety of views, a neural network–based recognition architecture specifically designed to represent very non-linear distributions of samples representing views will be described.

complex and inhomogeneous than the above mentioned ex- 1 Introduction amples: A robot hand camera that has to classify not only different objects but simultaneously grasping situations and Computer Vision (CV) is a substantial technique in modern hand gestures [6]. In the following, first the experimental robotics. Not only CV takes the role of “eyes” e.g. for tasks setup will be outlined. In section 3, the view based classifi- like navigation — what it is expected to do from a cogni- cation system will be described and in section 4 results will tive point of view — but often also replaces other sensory be discussed. input channels like haptics. However, CV capabilities are in general still far from the human example. One of the major shortcomings is that most CV systems are highly specialised 2 Grasping scenario to a certain task. CV applications perform the better the closer the limitations — in counting bright spots on a dark The robot system is used in the context of a man-machine background, CV outperforms humans by several orders of interaction scenario where a human instructor tells the robot magnitude but can’t tell if the spots are shells on a beach to grasp an object on a table using hand gestures and speech. or bolts on a conveyor belt. As a consequence, several spe- As a complete description of this system is beyond the scope cialised modules have to be used when dealing with a variety of this paper, we refer to [15, 18] and outline the setup only of tasks. in short. During the last ten years, view-based representations An active stereo camera head that is separate from the have become popular for recognition tasks. This technique robot detects and pursues hand movements of the instructor. is a powerful tool when geometrical modelling of an object When a pointing gesture is detected, the pointing direction domain is intractable. The key idea is memorising features is evaluated. Simultaneously, the active camera head surveys extracted from object views using filters generated from sam- the scene of objects on the table, so the correspondence be- ples instead of memorising representations derived from hu- tween pointing gesture and object can be established. The man expertise. active system gives the approximate world coordinates of the The view-based approach is inherently domain- indicated object to the robot system. The robot arm then independent. Nevertheless, this major benefit is not fully carries out an approach movement until the hand camera exploited in most applications in the sense that they are (Fig. 1) detects the object. The final approach is controlled strongly restricted to a certain domain. For example, by the hand camera alone without help from the active cam- the domain of the well known object recognition system era head. When the hand is above the object, the object developed by Murase and Nayar [14] is classification of identity is checked and the hand is moved into the optimal isolated objects (toys and household) on a turntable against grasping position for this type of object. Then the hand is black background. Another domain in which the view-based lowered and the object grasped, the oil pressure in the hy- approach is successful is face classification (e.g. [12]). draulic motor pistons indicates whether the grasp was suc- Still, the restriction to a certain scenario is unavoidable. cessful. However, it will be demonstrated in this paper that using In this contribution, we present a supplemental vision a suitable neural architecture a view based representation system for the hand camera which will be integrated into the can be built that captures a domain which is much more grasping architecture. So far, only the active camera head 2 G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29

Figure 2: View of the hand camera with the processing rect- angle. Three different types of tasks have to be solved: Classification of (1) object type, (2) stability of the grasp and (3) the direction indicated by a hand.

tion sensors on the hydraulic motor pistons, these position measurements cannot be used to estimate the resulting joint angles within the required accuracy due to mechanical hys- teresis. Insufficient sensory feedback is a major problem in the domain of grasping with grippers or even anthropomorphic Figure 1: Three-fingered oil-hydraulic robot hand holding an artificial hands. It is one of the reasons that robotic abili- object. The camera is mounted in front. ties in grasping objects are no match to human skills: We can grasp objects of almost arbitrary shape, many different sizes and we easily adapt the applied forces to light or heavy evaluates pointing gestures. Since the distance to the table is weight. Our ability is the result of a complex interaction be- over one meter, information for correction movements during tween the controlling neural system and the haptic and visual the approach phase cannot be gathered. Another shortcom- “sensors”. Force sensing in the fingertips plays an impor- ing is the insufficient sensory feedback from the robot hand, tant role in this process. Unfortunately, there is no technical by which it can be judged only whether the object is grasped equivalent to human fingertip force sensing by now, though at all but not whether the grasp is stable. The active cam- progress has been made e.g. using piezo-resistive pressure era head cannot be used to watch grasp stability because sensitive foils for tactile imaging [9]. The problem is not of the distance. To solve both these problems, a vision sys- to provide high quality sensors in general, but sensors which tem for the hand camera was developed which will enable have the size and shape of a human fingertip. Such finger- visual guidance of the robot from gestures carried out under tip sensors are still unreliable and coarse in measurement. the end-effector during the approach movement and a better As a consequence, other sensory sources are used, especially evaluation of grasp stability. CV, as miniature cameras and image processing hardware are easily available. Fig. 1 shows the three-fingered hand with the mounted 2.1 Robot setup camera (Sony XC 999P). The view point is chosen such that We use a standard industrial robot arm (PUMA 560) with all fingertips are still visible at the image borders when the six degrees of freedom. The end-effector is an oil-hydraulic hand is completely stretched (Fig. 2). The view area is also robot hand developed by Pfeiffer et al. [11]. It has three large enough to capture a human hand held below. equal fingers mounted in an equilateral triangle, pointing in parallel directions (Fig. 1). Each finger has three degrees 2.3 The recognition task of freedom: bending the first joint sideways and inward at the wrist, and bending the coupled second and third joint. The hand camera delivers three major categories of informa- The oil pressure in the hydraulic system serves as a sensory tion: directions of hand gestures, identity of grasped objects feedback. A detailed description of the setup can be found and a judgement on stability of the grasp. in [7]. Fig. 1 shows also a force/torque wrist sensor which For the purpose of guiding the robot, hand gestures are will not be used in the following. divided qualitatively in six classes: Left, right, come, away, up, and down, see Fig. 3. While the first four are intuitively 2.2 Visual sensors for grasping clear, the classes up and down are more of a symbolic nature, which is due to the fact that a finger pointing downwards is The hydraulic oil pressure alone is not sufficient as feedback hardly visible and a finger pointing upwards looks similar to to judge the success of the grasping movement. Using this the come gesture. data, it can only be judged whether the object is still in the Judging grasp stability requires an individual definition grasp or if it was lost. Though there are additional posi- of the classes stable / semi-stable / unstable for each ob- G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29 3

hands or situations. Hence, a large capacity of the recog- nition system is required. We will outline the basic prob- lems of such tasks in the next section and argue that the common method of principal component analysis (PCA) is not suitable in this case. Instead, in section 3.2 a neural network based architecture will be described that performs a local PCA. This method facilitates the representation of even highly inhomogeneous domains.

3.1 ANNs for recognition tasks Artificial Neural Networks (ANN) can acquire knowledge Figure 3: Classes of gestures indicating directions (from left from examples, which makes them an ideal tool in image to right and top to bottom): Left, right, come (towards processing, especially when dealing with data that cannot user), away (from user), up and down. easily be captured by explicit geometrical modelling. How- ever, due to the “curse of dimensionality”, ANN cannot be applied directly to the raw pixel data: nets of enormous size and huge training sets would be necessary. Hence, a preceed- ing feature extraction has to capture the interesting part of the image variance but to remove redundancy. To avoid spoiling the advantages of ANN by “hand made” features, feature extraction itself should be trained from samples.

Figure 4: Grasps of different stability on a Baufix-cube: Left stable, all fingers grip into a hole. Middle and right semi- stable, one finger has lost its grip. Though the difference can be seen clearly from the side (upper row) it is much more difficult to detect from the view of the hand camera (lower row). Figure 5: (Schematically) While simple PCA still works for sufficiently simple image data (left), a clustered distribution requires a representation by local PCA (right). ject. Fig. 4 shows an example: A wooden cube with holes (Baufix toy pieces) is held in a stable (left) position and two An established solution to dimensionality reduction is semi-stable positions (middle and right). While in the stable principal component analysis (PCA), which is related to the position all three fingers are within holes of the object, one Karhunen-Loeve transform. The image data are projected finger has lost contact in the semi-stable positions. In Fig. 7 to the NP CA principal components (PCs) with the largest an example can be seen for an object with a greater variety eigenvalues, where NP CA is much smaller than the origi- of possible grasping positions. nal dimensionality (i.e. the number of pixels). Moreover, The total number of classes N is hence interpolation between different object views becomes much simpler in the projection space than in the original pixel space N = NHG + NObj NStab (1) [14]. PCA has been successfully applied e.g. in object clas- · sification [14] or face recognition [12]. where NHG = 6 denotes the number of hand gestures, NObj Being a linear method, however, the limitation of PCA the number of objects and NStab the three stability classes. is that it can capture only the “global” variance of the data. However, in section 4 it becomes clear that not the entire This is sufficient e.g. in the object recognition application of product space of object– and stability–classes has to be cov- Murase and Nayar [14]. Though the domain consists of 100 ered by the classifier. objects, there are nevertheless great similarities: All objects are compact, size-normalised and centred in the middle of an image. Each object appears only with one degree of freedom 3 The neural recognition system (rotation). Therefore, PCA is still successful for classification and pose estimation, though it could be shown in [3] that The task described above differs from typical object recog- the VPL–architecture described in the next section performs nition problems in the way that some of the “objects” are better. 4 G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29

Output: Gesture classes Object classes Stability judgements

Expert LLM− classifiers net

LLM− net

LLM− select net

Features

Reference PCA− vectors net

best select PCA− match net

PCA− net

Input: image window

In this case, one separate output channel for each of the N Figure 6: The neural VPL-classification system maps an im- classes should be used to avoid the artificial introduction of age patch with pixel vector ~x to a vector valued output y~ “neighbourhoods” within the class system (e.g. classes num- in three steps: Left, the best match reference vector is de- ber 2 and 3 being closer than 2 and 10). Hence, training is termined. This first processing stage is trained from sample T r performed with sample windows ~xi , i = 1 . . . # samples, image patches by vector quantisation. Middle: ~x is pro- T r and corresponding binary output vectors (y~ )j = δ , j = jected to the local PCs which were calculated by the attached i jk 1 . . . N, coding the class k. The trained system classifies PCA-net. Above: The projection is classified by the selected unknown windows ~x taking the class k of the output com- LLM-net. The output vector ~y has one component for each ponent with maximal value: k = arg maxj (y~(~x))j . hand gesture (direction), one for each object type and one The classifier combines visual feature extraction and clas- for each stability judgement. sification in a three-stage architecture called VPL, which stands for Vector quantisation, PCA and LLM-network. Fig. 6 shows an overview of the architecture. In contrast, the distribution of visual data acquired by the hand camera is not only non-linear but forms several clusters in pixel space, as schematically depicted in Fig. 5. Hence, the 3.2.1 First level: Vector quantisation structure cannot be approximated by PCA. A suitable non- Vector quantisation (VQ) is carried out on the raw image linear extension of PCA is local PCA [19], which partitions windows to provide a first data partitioning using NV Q dif- the data into subsets for which simple PCA performs well. M ferent reference vectors ~ri IR , i = 1 . . . NV Q. If incre- ∈ Note “local” refers in this context to the partitioning in the mental algorithms like competitive learning are used for VQ, data space, not in the image plane. The neural architecture one of the major difficulties is codeword under-utilisation, outlined in the following is based on this idea. especially if the dimensionality is high [2]: Since the huge volume of a high dimensional space cannot be “filled” by 3.2 VPL–architecture the provided data, many reference vectors don’t “find” the data and remain outliers. The trainable classifier performs a mapping ~x ~y, ~x Many solutions to this problem have been proposed like → ∈ IRM , ~y IRN [3]. In computer vision, ~x consists of the the neural gas [10] or the Kohonen self-organising map [8]. ∈ pixels of an appropriately sized window of an image. For However, since these methods introduce interactions be- classification tasks, the output is a discrete valued class k. tween the reference vectors, parameters have to be chosen G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29 5 appropriately which turns out to be difficult. Algorithms which are designed specially to provide a homogeneous code- book utilisation are e.g. Activity Equalisation VQ (AEV) [5] or frequency sensitive competitive learning [1] since they take into account the codeword access frequencies. As a ho- mogeneously used codebook is substantial within the VPL- architecture, we use AEV in the first processing stage.

3.2.2 Second level: PCA

To each reference vector ~ri of the primary VQ a single layer Figure 7: Grasps of different stability as seen from the hand feed forward network is attached for the successive calcu- camera (lower row) and from a separate camera (upper row), lation of the principal components (PCs). The PCA-nets from left to right: (1) stable, (2) unstable because the left are trained as proposed by Sanger [17]. The input ~x is finger pushes the object out of the other two fingers, (3) projected to the NP CA PCs with the largest eigenvalues: stable since two fingers are in the holes, (4) like (3) but only NP CA ~x p~l(~x) IR , l = 1 . . . NV Q. p~l(~x) can be regarded semi-stable because the object is tilted and one finger slipped → ∈ as feature vector of input ~x. out of the hole.

3.2.3 Third level: LLM-nets one for each stability class proves to be sufficient, so the stability channels are common to all objects (Fig. 6). In the third processing stage, to each of the NV Q PCA-nets one expert neural classifier of the Local Linear Map – type (LLM-net) is attached [16]. The LLM-nets perform the final 4.1 Image data N mapping p~l(~x) ~y IR . The LLM-network is related → ∈ to the self-organising map [8] and the GRBF-approach [13]. For evaluation, 130 images of hand gestures and 60 images The LLM-net performs a vector quantisation using NLLM of objects within the grip of the robot hand were used. The nodes in the input space and can be trained to approximate larger number of hand images is necessary due to the greater a nonlinear function by a set of locally valid linear mappings appearance variability of a hand. The gestures to be carried which are attached to the reference vectors, for details see out were qualitatively predefined. The number of objects was e.g. [16]. The VQ is again carried out using the AEV- NObj = 5, they were presented to the robot on a table in algorithm [5]. different poses, then the robot hand was lowered and closed. Some grasps were entirely unstable and the object was lost at once when lifted up. Successful grasps were classified into 3.2.4 Training and application stable, semi-stable and unstable from human knowledge. The three processing stages are trained successively, first vector quantisation and PCA-nets (unsupervised), finally the 4.2 Classification performance LLM-nets (supervised). Classification of input ~x is carried out by finding the best match reference vector ~rn(~x), i.e. As input to the VPL, a grey value window sub-sampled to the vector closest to ~x. In case ~x is exactly in the middle 81 59 pixels was used which covers most of the camera im- × between two reference vectors, ~rn(~x) has to be chosen by age (rectangle in Fig. 2). Performance was tested on the 190 chance, however, this case is extremely rare. images using the “leaving one out” strategy. Using this data

Subsequently ~x is mapped to p~n(~x)(~x) by the attached base, parameters NV Q = 12, NP CA = 8 and NLLM = 20 PCA-net, finally the mapping p~ (~x) y~ is performed. proved to be the best choice (see Fig. 8). For the com- n(~x) → The major advantage of the VPL-classifier is its ability to bined hand posture / object classification 91% correct clas- form many highly specific feature detectors (the NV Q NP CA sifications could be achieved (without stability judgement). · local PCs), but needing to apply only NV Q + NP CA filter Errors were mostly caused by misclassifications of the hand operations per classification: First NV Q operations to find gestures. In addition, the three stability judgement channels the best match reference vector, then NP CA operations to were evaluated whenever an object was detected, correct- extract the feature vector. So a large set of filters storing ness was 84%. The relatively high error can be explained implicit object knowledge has not to be paid by a high com- by the poor visibility of some of the grasps, an example is putational effort in application. Fig. 7: The last two grasps differ in stability, but the tilt of the object is difficult to be seen from the hand camera. Fig. 8 compares the combined hand posture- and object 4 Results recognition rates — i.e. the percentage of correctly classified samples — for different values of NV Q and NP CA and mo- The task of recognising hand gestures, object types and tivates the choice NV Q = 12, NP CA = 8. The local-PCA grasp stabilities requires for a complete separation a VPL- architecture performs significantly better than normal PCA classifier with N = NHG + NObj NStab output channels (case NV Q = 1), which reaches only a correctness of 86% · (section 2.3). However, one channel for each object plus for hand posture / object classification and 78% for stability 6 G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29

95 tiresome). This is brought about by holding the robot hand 90 at a constant and relatively low hight over the table, which 85 leaves little range for the hand of the user. Further, the 80 hand must be approximately centred below the camera to 75 be completely visible. This can quite easily be done with a 70

65 little practice because the robot fingers indicate the location

60 of the centre. However, we hope to improve this situation recognition rate 55 by using an acoustic feedback for hand centreing and recog-

50 nisability in future work. 4 PCs 45 6 PCs 8 PCs 12 PCs 40 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # reference vectors 5 Conclusion and outlook

Figure 8: Recognition rate for the combined hand posture- We presented a visual recognition system for the end-effector and object classification task (without stability judgement) camera of a three-fingered robot hand. This system has for different NV Q and NP CA. to solve different tasks: Classification of hand gestures car- ried out under the end-effector for guidance, recognition of grasped objects and judging grasp stability. It could be judgement (the latter is not plotted in Fig. 8). Increasing shown that the VPL-architecture outlined in section 3.2 is NP CA further does not improve the results. able to represent this image domain, which is achieved by a This behaviour is in accordance with previous work [3, 4], combination of feature extraction based on local PCA with where it could be shown that the classification performance neural classifiers. of the VPL is “well-behaved” for changes of the three fun- So far, the scenario outlined in section 2 makes use only damental parameters NV Q, NP CA and NLLM . The recog- of the object classification results of the hand-camera. Fu- nition rate increases smoothly in all three parameters until ture goals are the development of control strategies which saturation is reached. Over-fitting does not occur for NV Q allow integration of the qualitative gesture recognition in and NLLM because the AEV-algorithm automatically sorts the grasping process and the design of adequate grasping out under-utilised reference vectors. For NP CA too large behaviours that incorporate the feedback from the hand- values can be easily avoided watching the eigenvalues. camera. The different performance of simple PCA and local PCA The future development of the vision system will be can be easily explained by the inhomogeneity of the image aimed to exploit the representational capabilities by classifi- data (hands versus objects in gripper), which results in a cation of a greater variety of situations. For the hand ges- highly nonlinear distribution in pixel space. For comparison, tures, we plan the transition from pre-defined “symbolic”, VPL-classifiers with NV Q = 1 and NV Q = 8 were trained non-continuous poses to more natural gestures. The basis separately for the gesture– and the object classification task. will be an image database with test persons who were not In this case, the advantage of the VPL with NV Q > 1 over instructed to use predefined poses but use their own natural the normal PCA case is much smaller. gesture. The object recognition part will not only be improved in 4.3 Operating conditions the stability judgement in future work, but the hand camera will also be involved in the grasping process itself. This To achieve good results the scenario is subject to some con- means a new category of situations will be involved. To ditions. In principle, background and lighting conditions are guide the hand to a suitable grasping position, a judgement arbitrary as long as they remain constant: The system adapts of the “grasping situation” will be needed which evaluates to every colour or texture during the training phase, but con- the object pose with respect to the finger positions. By this ditions must be the same for application. For example, even means the grasping process could be optimised and grasp a sharp beam of light from one side can be used as illu- stability increased. mination because the system adapts automatically even to strong shadows. Shadows are just another form of “object appearance” — as long as conditions are fixed. 6 Acknowledgement In practice, however, the requirement of constancy still implies several restrictions. Though background texture This work was supported by the Deutsche Forschungsge- could be trained, homogeneous background must be used meinschaft (DFG) within the collaborative research project because moving of the robot arm would change the pattern SFB 360 “Situated Artificial Communicators”. under the camera. For the same reason, homogeneous il- lumination from several diffused sources is necessary since the robot arm moves over a considerable range. Another References restriction is that the hand of the instructor must have ap- proximately the same size in the camera image for training [1] S. C. Ahalt, A. K. Krisnamurthy, P. Chen, and D. E. and application (training for several distances would be quite Melton. Competitive learning algorithms for vector G. Heidemann, H. Ritter / Kunstliche¨ Intelligenz 2 (2003) 24–29 7

quantization. Neural Networks, 3:277–290, 1990. Kontakt [2] S. Grossberg. Competitive learning: From interactive activation to adaptive resonance. Cognitive Sci., 11:23– Dr. Gunther Heidemann 63, 1987. Universitat¨ Bielefeld [3] G. Heidemann. Ein flexibel einsetzbares Objekterken- Technische Fakultat,¨ AG Neuroinformatik nungssystem auf der Basis neuronaler Netze. PhD the- Postfach 10 01 31, D-33501 Bielefeld sis, Univ. Bielefeld, Technische Fakultat,¨ 1998. Infix, Tel.: +49 (0)521 106-6056 DISKI 190. Fax: +49 (0)521 106-6011 [4] G. Heidemann and H. Ritter. Combining multiple neu- e-mail: [email protected] ral nets for visual feature selection and classification. In WWW: http://www.TechFak.Uni-Bielefeld.DE/ags/ni ICANN 99, Ninth Int’l Conf. on Artificial Neural Net- works, pages 365–370, 1999. Prof. Dr. Helge Ritter [5] G. Heidemann and H. Ritter. Efficient Vector Quanti- Universitat¨ Bielefeld zation using the WTA-rule with Activity Equalization. Technische Fakultat,¨ AG Neuroinformatik Neural Processing Letters, 13(1):17–30, 2001. Postfach 10 01 31, D-33501 Bielefeld [6] G. Heidemann and H. Ritter. Combining gestural and Tel.: +49 (0)521 106-6062 contact information for visual guidance of multi-finger Fax: +49 (0)521 106-6011 grasps. In M. Verleysen, editor, Proc. ESANN 02, pages e-mail: [email protected] 301–306, Bruges, Belgium, 2002. d-side publications. WWW: http://www.TechFak.Uni-Bielefeld.DE/ags/ni [7] J. Jockusch. Exploration based on Neural Networks with Applications in Manipulator Control. PhD thesis, Bild Gunther Heidemann studied at the Univ. Bielefeld, Technische Fakultat,¨ 2000. Universities of Karlsruhe and Munster¨ and [8] T. Kohonen. Self-Organizing Maps. Springer Verlag, received a Ph.D. in from Biele- 1995. feld University in 1998. He is currently work- [9] H. Liu, P. Meusel, and G. Hirzinger. A tactile sensing ing within the collaborative research project system for the DLR three-finger robot hand. In Proc. ”Hybrid Knowledge Representation” of the ISMCR 95, pages 91–96, 1995. SFB 360 at . His fields of [10] T. Martinetz, S. G. Berkovich, and K. Schulten. research are mainly computer vision, neural “Neural-gas” network for vector quantization and its networks and hybrid systems. application to time-series prediction. IEEE Trans. on Neural Networks, 4(4):558–569, 1993. [11] R. Menzel, K. Woelfl, and F. Pfeiffer. The development Bild Helge J. Ritter studied physics and mathe- of a hydraulic hand. In 2nd Conf. on Mechatronics and matics at the Universities of Bayreuth, Hei- Robotics, pages 225–238, 1993. delberg and and received a Ph.D. in [12] B. Moghaddam and A. Pentland. Probabilistic visual physics from the Technical University of Mu- learning for object representation. IEEE Trans. on Pat- nich in 1988. Since 1985, he has been en- tern Analysis and Machine Intelligence, 19(7):696–710, gaged in research in the field of neural net- 1997. works. In 1989 he moved as a guest scien- [13] J. Moody and C. Darken. Learning with localized re- tist to the Laboratory of Computer and In- ceptive fields. In Proc. of the 1988 Connectionist Mod- formation Science at Helsinki University of els Summer School, pages 133–143. Morgan Kaufman Technology. Subsequently he was assistant Publishers, San Mateo, CA, 1988. [14] H. Murase and S. K. Nayar. Visual learning and recogni- research professor at the then newly estab- tion of 3-d objects from appearance. Int’l J. Computer lished Beckman Institute for Advanced Sci- Vision, 14:5–24, 1995. ence and Technology and the Department [15] R. Rae, M. Fislage, and H. Ritter. Visuelle Aufmerk- of Physics at the University of Illinois at samkeitssteuerung zur Unterstutzung¨ gestikbasierter Urbana-Champaign. Since 1990, he is pro- Mensch–Maschine Interaktion. KI – Kunstliche¨ Intelli- fessor at the Department of Information Sci- genz, Themenheft Aktive Sehsysteme, (1):18–24, 1999. ence, Bielefeld University. His main inter- [16] H. J. Ritter, T. M. Martinetz, and K. J. Schulten. Neu- ests are principles of neural computation, in ronale Netze. Addison-Wesley, Munchen,¨ 1992. particular self-organising and learning sys- [17] T.D. Sanger. Optimal unsupervised learning in a single- tems, and their application to machine vi- layer linear feedforward neural network. Neural Net- sion, robot control, data analysis and interac- works, 2:459–473, 1989. tive man-machine interfaces. In 1999, Helge [18] J. Steil, G. Heidemann, J. Jockusch, R. Rae, N. Jung- Ritter was awarded the SEL Alcatel Research claus, and H. Ritter. Guiding attention for grasping Prize and in 2001 the Leibniz Prize of the tasks by gestural instruction: The gravis-robot archi- German Research Foundation DFG. tecture. In Proc. IROS 2001, 2001. [19] M. E. Tipping and C. M. Bishop. Mixtures of proba- bilistic principal component analyzers. Neural Compu- tation, 11(2):443–482, 1999.