Scale and the Differential Structure of Images
Total Page:16
File Type:pdf, Size:1020Kb
Scale and the differential structure of images Luc M J Florack, Bat M ter Haar Romeny, Jan J Koenderink and Max A Viergever capture the crucial observation of the inherently multi- Why and how one should study a scale-space is scale character of image structure. prescribed by the universal physical law of scale For some time there has been discussion on the invariance, expressed by the so-called Pi-theorem. question of how to generate a scale-space, the con- The fact that any image is a physical observable with an tinuous analogue of the pyramid, in a unique way, as inner and outer scale bound, necessarily gives rise to a there seemed to exist no clear way to choose among the 'scale-space representation', in wphich a given image is many possible scale-space filters' *. One obviously represented by a one-dimensiond family of images needed a set of natural, a priori scale-space constraints. representing that image on vurious levels of inner spatial A fundamental approach was adopted by scale. An early vision system is completely ignorant of Koenderink', Witkin" and Yuille and Poggio', who the geometry of its input. Its primary task is to establish formulated an a priori conttraint in the form of a this geometry at any available scale. The absence of causality requirement: no 'spurious detail' should be geometrical knowledge poses additional constraints on generated upon increasing scale. This, together with the construction of a scale-space, notably linearity, some symmetry constraints, unambiguously established spatial shift invariance and isotropy, thereby defining a the Gaussian kernel (i.e. the Green's function of the complete hierarchical family of scaled pariial differential isotropic diffusion equation) as the unique scale-space operators: the Gaussian kernel (the lowest order, filter. Its width r can be identified with spatial scale. rescaling operator) and its linear partirzl derivatives. One can model an image as a scalar field on a finite- They enable local image analysis through the detection dimensional manifold and apply fundamental mathe- of local differential structure in a robust wuy, while at the matical operations, like differentiations, to reveal local same time capturing global features through the extra image structure. There exist many useful and rather scale degree of freedom. In this paper we show why the well-established mathematical disciplines, notably operations of scaling and differentiation cannot be differential geometry, tensor calculus, invariants separated. This framework permits us to construct in a theory, all of which have an increasing impact on systematic way multiscale. cartesiarl differential nowadays image structure analysis. invariants, i.e. true image descriptors that exhibit In this paper we discuss the fundamental concept of manifest invariance with respect to a change of cartesian scaling as well as some natural constraints of a front- coordinates. The scale-space operators closely resemble end visual system, and show that a complete hier- the receptive field profiles Jhrind in mammalian front- archical set of .scaled differential operators follows from end visual systems. these considerations. The lowest order kernel is the isotropic Gaussian. The higher order kernels are the Keywords: scale-space, Gaiwsian kerfiiel, Gaussian scaled Gaussian derivatives, which constitute the derivatives, differential invariants natural differential operators on a given scale. With this set we can study local image geometry to any desired order. To this end we will introduce the Over the last few years there has been an increasing conce t of a local jet of order N, JN[L(P)],also called tendency in the image analysis literature towards a N-jet ',defined as the equivalence class of functions L multiscale approach. A historical contribution to such which share the same N-truncated l'aylor expansion at an approach was the introduction of the pyramid I. a given point P. In other words, all images in a given N- Though being based on a rather ad hoc method of jet are locally indistinguishable modulo higher order averaging neighbouring pixels. this first model did differences. Such a local A.'-jet can be represented with respect to a cartesian coordinate system by the set of partial derivatives up to Ntli otdet, evaluated at the Computer Vision Research Group, Utrccht University hospital, point P, so: Heidelberglaan 100, 3584 CX Utrccht. The Nethcrlands received: February 1992 Paper 7 J"L(P)I = .(L,' . JP)}L, (1) 0262-8856/92/006376-13 0199 12 Butterworth-Heinemann Ltd 376 image and vision computing 'The lower spatial indices attached to L all have values necessity of a multiscale approach and to derive the within the range 1 . 13, where D is the dimension of unique scale-space operators for arbitrary dimensions the image domain. and denote differentiation with n>1. respect to the associated spatial variable. Derivatives of arbitrary order are generally well- defined and robust provided they can be calculated on a Basic front-end vision constraints sufficiently high scale (relative to pixel scale and noise correlation width), and provided we have a sufficient Many interpretations of a front-end vision system are resolution of intensity values (dynamic resolution, possible. We assume that its sole task is to establish a noise). We will not present a detailed discussion on representation of a given observable in a convenient these trade-offs here. but refer to Blom er al.". In this forrnat. The interpretation is left to dedicated postpro- paper we will restrict ourselves to N d 3. cessing routines, which read out the formatted data The approach is valid in D dimensions. whereas represented by the front-end (cf. the 'sensorium' in much of the literature is limited to 1 or 2 Koenderink 13). By definition, a front-end vision system dimensions I. 2.4. s is assumed to be completely ignorant of any a priori geometry of its input. This lack of LI priori geometrical knowledge argues for an a priori symmetric sampling and preprocessing of its input. Hence it is quite natural THEORY to define a front-end vision system by formulating a set Physical versus mathematical operators of plausible symmetries. We propose the following set": The only way to obtain structural information about a physical scene is to extract oh.servab1e.s (i.e. images) 0 linearity: allowing for superposition of input with the help of some measuring apparatus. We stimuli. inevitably have to face the problem of fixing the proper 0 spatial shift invariance: implied by the absence of a scale. because observables are always characterized by perferred location. an intrinsic, finite scale range. Its lower bound is 0 isotropy: implied by the absence of a preferred determined by the sampling characteristics of the direction. device, whereas the upper bound is limited by the scope scale invariance: implied by the absence of a of the field of view. preferred scale. The very fact that an image is a physical observable makes it subject to an extra constraint imposed by the These basic symmetry requirements are rather weak, iiniversul lm*of sculr invariunce, which governs all laws because we do not want the front-end system to commit of physics. There is no such scaling constraint on a itself to any specific task beyond representation. Note mathematical, i.e. a dimensionless scalar field, defined that none of these symmetry constraints are strictly on a dimensionless manifold. but it is instructive to necessary for the sole purpose of data representation, observe how mathematicians alternatively constrain it but they do significantly decrease the burden on by imposing convenient regularity conditions: a mathe- interpreting routines that address the front-end, since matical function is typically assumed to be 'sufficiently these will now be refrained from the overhead of smooth', say a Ch'(LO)-function on a D-dimensional having to reconcile the data with the symmetries of the domain $1, with A' sufficiently large to justify the environment that are known in advance anyway: the operations performed on it. For a physical observable front-end system will make this a priori knowledge of we cannot pose such smoothness constraints. the environment manifest. In this precise sense, the Clearly, it makes no sense to define a derivative of a front-end postulates will make up for a convenient sampled image in the strict mathematical sense (this format. would require the existence of an infinitesimal neigh- bourhood as well as a smoothness constraint on neighbouring image values). One usually circumvents this problem by considering neighbouring pixels instead Scale invariance of infinitesimal neighbourhoods in the definition of a Let F(xl,. ., xu) be some physical observable, e.g. derivative. A well-known example of this is the 5-point the image luminance as a function of spatial coordin- Laplacean kernel". 'This is. however. a non-robust and ates, time, etc. From a pure mathematical point of view rather ad hoc solution that crucially relies on imaging there is no restriction whatsoever on the form of the conditions, like grid size and pixel shape. Using this function F. But because we are dealing with a physical operational Laplacean amounts to the implicit assump- entity, the requirement of scale invariance imposes a tion that the structures of interest have a spatial extent restriction on the form of F only those functions are close to pixel scale. Moreover, it assumes that the allowed that 'scale properly'. The precise meaning of structures of this scale are meaningful, which is this statement is expressed by the following generally not the case (think of pixel-correlated noise or dithered images). Disregarding the intrinsic dinlensionality of an image *There may be asynimetries in the external environment the system or, in other words, the scaling degree of freedom, is the has to operate in.