Active Perception
Total Page:16
File Type:pdf, Size:1020Kb
Active Perception RUZENA BAJCSY, MEMBER, IEEE Invited Paper Active Perception (Active Vision specifically) is defined as a study II. WHATIS ACTIVESENSING? of Modeling and Control strategies for perception. By modeling we mean models of sensors, processing modules and their interac- In the robotics and computer vision literature, the term tion. We distinguish local models from global models by their “active sensor” generally refers to a sensor that transmits extent of application in space and time. The local models repre- (generally electromagnetic radiation, e.g., radar, sonar, sent procedures and parameters such as optical distortions of the lens, focal lens, spatial resolution, band-pass filter, etc. The global ultrasound, microwaves and collimated light) into the envi- models on the other hand characterize the overall performance ronment and receives and measures the reflected signals. and make predictions on how the individual modules interact. The We believe that the use of active sensors is not a necessary control strategies are formulated as a search of such sequence of condition on active sensing, and that sensing can be per- steps that would minimize a loss function while one is seeking the formed with passive sensors (that only receive, and do not most information. Examples are shown as the existence proof of the proposed theory on obtaining range from focus and sterolver- emit, information), employed actively. Here we use the term gence on 2-0 segmentation of an image and 3-0 shape parametri- active not to denote a time-of-flight sensor, but to denote za tion. a passive sensor employed in an active fashion, purpose- fully changing the sensor’s state parameters according to sensing strategies. I. INTRODUCTION Hence the problem of Active Sensing can be stated as a Most past and present work in machine perception has problem of controlling strategies applied to the data acqui- involved extensive static analysis of passively sampled data. sition process which will depend on the current state of the However, it should be axiomatic that perception is not pas- data interpretation and the goal or the task of the process. sive, but active. Perceptual activity is exploratory, probing, The question may be asked, “Is Active Sensing only an searching; percepts do not simply fall onto sensors as rain application of Control Theory?” Our answer is: “No, at least falls onto ground. We do not just see, we look. And in the not in its simple version.” Here is why: course, our pupils adjust to the level of illumination, our 1) The feedback is performed not only on sensory data eyes bring the world into sharp focus, our eyes converge but on complex processed sensory data, i.e., various or diverge, we move our heads or change our position to extracted features, including relational features. get a better view of something, and sometimes we even put 2) The feedback is dependent on a priori knowledge- on spectacles. This adaptiveness is crucial for survival in an models that are a mixture of numeric/parametric and uncertain and generally unfriendly world, as millenia of symbolic information. experiments with different perceptual organizations have clearly demonstrated. Yet no adequate account or theory But one can say that Active Sensing is an application of or example of active perception has been presented by intelligent control theory which includes reasoning, deci- machine perception research. This lack is the motivation sion making, and control. This approach has been elo- for this paper. quently stated by Teoenbaum [I]: “Because of the inherent limitation of a single image, the acquisition of information should be treated asan integral part of the perceptual pro- Manuscript received November23,1987; revised March21,1988. cess . Accommodation attacks the fundamental limita- This work was supported in part by NSF Grant DCR-8410771, Air tion of image inadequacy rather than the secondary prob- Force Grant AFOSR F49620-85-K-0018, Army/DAAG-29-84-K-O061, lems caused by it.” Although he uses the term NSF-CERIDCR82-19196A02, DARPMONR NIH Grant NS-10939-11 as part of Cerebo Vascular Research Center, NIH l-ROl-NS-23636- accommodation rather than active sensing the message is 01, NSF INT85-14199, NSF DMC85-17315, ARPA N0014-85-K-0807, the same. NATO Grant 0224/85,and by DEC Corporation, IBM Corporation, The implications of the active sensing approach are the and LORD Corporation. following: The author is with the Computer and Information Science Department, University of Pennsylvania, Philadelphia, PA 19104, 1) The necessityof models of sensors. This is to say, first, USA. the model of the physics of sensors as well as the noise of IEEE Log Number 8822793. the sensors. Second, the model of the signal processingand 0018-9219/88/0800-0996$0~.000 1988 IEEE 996 PROCEEDINGS OF THF IFFF Vnl 7fi Nn I( AIICIICT 1000 data reduction mechanisms that are applied on the mea- processor system on a MULTIBUS, with a MC68000, a Matrix sured data. These processes produce parameterswith adef- frame grabber and buffer, as an array processor, dedicated inite range of expected values plus some measure of uncer- image processing units, and a programmable transform tainties. These models shall be called Local Models. processor. Image positioning is achieved with a panltilt head 2) The system (which mirrors the theory) is modular as and motorized zoomlfocus lens. Although, this is a pow- dictated by good computer science practices and inter- erful and flexible system. Weiss [4] describes a model ref- active, that is, it acquires data as needed. In order to be able erence adaptive control feedback system that uses image to make predictions on the whole outcome, we need, in features (areas, centroids) as feedback control signals. addition to models of each module (as described in 1) Kuno etal. [5] report a stereo camera system whose inter- above), models for the whole process, including feedback. ocular distance, yaw, and tilt are computer controlled. The We shall refer to these as Global Models. cameras are mounted on a specially designed linkage. The 3) Explicit specification of the initial and final statelgoal. distance between the two cameras is controlled: the larger If the Active Vision theory is a theory, what is its predic- the distance, the more precisely disparity information from tive power?There are two components to our theory, each stereo can be converted to absolute distance; the smaller with certain predictions: the distance, the easier is the solution to the correspon- 1) Local models. At each processing level, local models dence problem. From the article, it is not clear how this flex- are characterized by certain internal parameters. Examples ibility will be exploited, nor how the 3 degrees of freedom of local modelscan be: region growing algorithm with inter- will be controlled. nal parameters, the local similarity and size of the local Recently the Rochester group [6] has reported a mobile neighborhood. Another example is an edge detection algo- stereocamerasystemaswellas Poggioat MIT[7l. Both these rithm with parameter of the width of the bandpass filter in groups are interested in modeling the biological systems which one is detecting the edge effect. These parameters rather than the machine perception problems. predict a) the definite range of plausible values, and b) the One approach to formulating sensing strategies is noise and uncertainty which will determine the expected described by Bajcsy and Allen [8]. But in this paper, control resolution, sensitivitylrobustness of the output results from is predicted on feedback from different stagesof object rec- each module. Following the edge detection example, from ognition. Such feedback will not be available from the pro- the width of the bandpass filter, we can predict how close cesses determining spatial layout, which do not perform two objects can be before their boundaries will merge, i.e., recognition. what is the minimum separation distance.The same param- Recently a very relevant paper to our subject matter eter will predict what details of the contour are detectable, appeared at the First International Conference on Com- and so on. puter Vision by Aloimonos and Badyopadhyay [9] with the 2) Global models characterize the overall performance title, “Active Vision.” They argue that: “problems that are and make predictions on how the individual modules will ill-posed, nonlinear or unstable for a passive observer interact which in turn will determine how intermediate become well-posed, linear or stable for an active observer.” results are combined. The global models also embody the They investigated five typical computer vision problems: globallexternal parameters, the initial and finallglobal state shape from shading, shape from contour, shape from tex- of the system. The basic assumption of the Active Vision ture, structure from motion and opticflow(area based).The approach is the inclusion of feedback into the system and principal assumption that they make is that the active gathering data as needed. The global model represents all observer moves with a known motion, and of course, has the explicit feedback connection, parameters, and the opti- available more than one view, i.e., more data. Hence, it is mization criteria which guides the process. not surprising thatwith more measurements taken in acon- trolled fashion, ill-posed problems get converted into well- Finally, all the predictions have been orwill be tested exper- posed problems. imentally. Although all our analysis applies to modalities of We differ, however, with Aloimonos and Bandyopadhyay touch, we shall limit ourselves onlyto visual sensing, hence in the emphasis of the Active Vision (or perception in gen- the name Active Vision (AV). eral) as a scientific paradigm. For us the emphasis is in the study of modelingandcontrolstrategies for perception, i.e., modeling of the sensors, the objects, the environment, and 111. PREVIOUS WORK the interaction between them for a given purpose, which To our knowledge, very little work has been performed can be manipulation, mobility, and recognition.